Deep Learning: Beyond the Hype

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] so i'm magnus nudin i'm technical director at seed and seed is an r d department of electron cards and we try to look a bit further than the typical game team can they usually have a horizon on one two or three years we can look three to five years even and we do a lot of game ai deep learning we also do rendering graphics virtual humans avatars and we also have a group that builds prototypes and new game experiences and i'll talk about deep learning beyond the hype today and first i'll show you a few use cases of deep learning and then some possibilities this will be the sci-fi part but what we probably can do in the future that we can't do today then i'll look deep dive into deep reinforcement learning and look at some results because without results it would still only be hype and then we'll talk about some of the difficulties of doing this deep reinforcement learning and why it's worth doing anyway and yeah one one sign of the hype i guess is that i'm required to have this disclaimer now it's for the first time in the presentation but it says that we are doing uh it's a safe ai okay so neural networks deep learning is neural networks deep neural networks uh it's just a function estimator really it can take any input and produce almost any other outputs in this case it's 10 million pixels and the output is the label cat and the nice thing this of course is very complex function the nice thing with this is that we don't have to define this function ourselves it's it's trainable so we train the function instead and that's that's why it's called learning uh we can get more elaborate computer vision we have the description of a picture instead and then we use one computer vision network and another language generation network we can recognize voice we can generate voice we can create music we can generate the images from descriptions in this case so the text is the input the image is the output and we can play games which is of course the most important part for us so let's look at some use cases the first one is like i said playing games we can play board games like alphago did and i'll come back to this i guess most people have heard about this one i'll talk a bit more later about this but then we have this when openai is the dota agent challenged one of the best players in the world it's just one versus one but uh [Applause] so it's a limited version of dota but it was still very impressive but they can do this because no human has been able to program without cheating just using the same information as humans [Music] we can also do post estimation and why is this important we do this today in big capture studios motion capture studios but this technology will soon provide anyone with a mobile phone at home to do motion capture of course the precision is not as high as a professional motion capture studio yet but i'm sure we will get that and voice most people know that voice generation the voice generation standard especially the latest thing coming out of google and baidu and others is amazingly good so i won't actually show you any voice generation samples because you already know that but i'll show you another interesting thing we are encouraged by the news we are encouraged by the news so what this was the first was the input to a network the second sentence was the output of the network so it converted from one one voice into another voice and this is of course very interesting for us we record a lot of voice in some of our games and if you want to act if you want to change one line of that voice recordings which can be hours or hundreds of hours in some cases old voice you have to bring the actor back that might not be possible so what if we could have someone else speak and generate that actor's voice instead and of course for if you have your lord of the rings mmo you don't want collateral to sound like 12 year old bird it can actually sound like galadriel with this technique so let's hear two more examples who was the mystery mp who was the mystery mp it was a breathtaking moment it was a breathtaking moment this by the way is worked by google deep mind and i will mention google deep mind a lot because they they open ai and nvidia and a few others have a very good research team that create a lot of new cool deep learning stuff we can also make a neural network thing so the input to this neural network is the lyrics and the musical notes and this is the output [Music] [Music] on [Music] uh more things we can do with voice is animating faces from voice this work is from nvidia research and to the left here we see the fully motion captured face made in a capture studio to the right we have a neural network animating the face using the voice as the only input to animate the face are those eurasian footwear cowboy chaps or jolly earth moving headgear this is my reality and this is the reality of my people nbc glad why fox tv jerks quiz pm so it's it's not perfect but it's very good it's better than most other voice to face solutions i've seen so we also have content generation and procedural content generation this is an example from anastasia in in the ct and everything in this picture is procedurally generated so you can you have parameters to to change everything to whatever variations you want of this scene and of course we can also use the same for and maybe that's more common for natural things like trees so in this case anastasia used biological based rules to get to generate trees uh like they would actually look in nature and as you see we can get a lot of variants and it actually looks like natural trees but of course this requires a lot of work it's hard to come up with these rules and you also have to tweak the parameters to actually get the result you want so what if we could train something to to create these rules and parameters so let's look at one of the best samples i've seen and this is also from the same group that did the voice to face animation and none of these people are real they are created by a neural network that's had learned the rules for faces from by looking at 30 thousand people thirty thousand celebrities which you can probably tell but none of these images is in that set the learning set they are all new people and the thing is i said procedural content you can permit its parameterized rules so you can change the parameters and in this case they glide through they interpolate through parameter sets of the faces so this is pretty cool and of course we this is 2d generation and it's super impressive but we would like free degeneration and 3d generation is not really there yet but it's there's lots of research going on and i think within a couple of years we will have useful 3d generation using generative networks but when you use procedural content generation to generate worlds they are all often quite lifeless so you need to generate life in them so emergent behavior and life uh is also something we can generate with machine learning this is also from deepmind the only goal goal this agent has is to move forward it can control the the muscles in the little body and it has to solve the problem uh to just move forward keep moving forward and it gets some harder here's a harder body to control and it's also harder and yeah sometimes it fails and we we actually had a guy at dice do a more spiced up version of this so we can take a look at that instead [Music] oh so if anyone out there is working with sound you know how important your sound is super important so that's uh possibilities so let's look uh we have seen now a lot of individual features that can be used but what if we combine a few few of these features to look at something that's pure sci-fi today but might i'm sure is possible within a couple of years so most games today are very violent and why is that that's because violence is the simplest interaction to create in a game actually it's much harder to do social interaction that's a pity because when we asked or i asked a few people what are their best gaming experiences ever a surprising amount of people ans answered something like this a woman walks up to you and says that that power you hold that's a strange an ancient what are you i'm a witch hunter you hear a voice was the alumni following you an offering of the world i knew it cleans the palate you see her flesh extend as her arms grow yeah so have you ever seen vin diesel this happy so all old-fashioned pen and paper playing uh it's of course the social interaction you get around the table that's that we really can't do in games today but it requires a lot of imagination they have to visualize everything in front of them so that's something we should be able to help some people have tried to help with live role playing but it's it's definitely not for everyone so we in the game industry come up with came up with this massive multiplayer online role playing games uh however they ended up like this most of them so this is uh more than social interaction it's probably you can of course do social interaction in world of warcraft but it's mostly a highly complex exercise in coordination and intricate game mechanics so not much role-playing there so we have this concept called true role-playing because we think it's important board gaming is more is more popular than ever ordinary board games have never been more popular and i think one of the reasons is because the social interaction you can have around the table so this is an experiment we did in our lab a while back and it's a skinning experiment in vr so you can control the character this this um it's a battlefield 4 character and we can change characters and uh this is just a prototype but i think uh there's lots of animation technology and going on with deep learning as well that will enable this to become really really good soon and if we take this skinning and we add the voice conversion i showed you then you can look and sound like a character of your choice and of course we need the face animation as well to actually animate the skin uh to say what you say uh so i'll show you two scenes uh that we absolutely can't do today imagine that this is you and your friends in in virtual reality doing some role-playing here we go if you're going to play with the big dogs no fear i'm in that's you echo i just want to say it's been a while since we opened the books and uh in regards to you guys bert jerry as a man of few words i love you enough though no blood no blood shallow blood so for being tv scenes in a tv show there's nothing special with these scenes at all but if we even try to imagine doing this in a game a multiplayer setting with many players it's it's impossible to do today but i think combining those techniques i mentioned it will be hard and there's a few years left before this is realizable but i think it can be done and then it will open a whole new genre of games like more amateur theater and true role playing okay let's start looking at game ai and reinforcement learning so short introduction to reinforcement learning we have an agent that's the the intelligent thing and it's acting in an environment from the environment gets observations and rewards back so when it does something good it gets rewards and it can also observe what happens when it acts usually it has a goal as well it's typically to just optimize the rewards to get that high reward as possible so we of course reinforcement learn learning is an old technique but the new thing is that we combine this with the neural network and get the deep reinforcement learning and this is learning by doing it's the same way that both animals and humans learn so let's look at the simplest possible example of reinforcement learning this is a very simple game the blue dot here is our hero and he or it is supposed to eat the green green dots and avoid the red dots so first we just drop him into the world and it's not going well the score is negative and it's going down even though he hits i have a hard time saying he or she about this it it hits the red dots green dot sometimes it's also hitting red dots and but just after a few minutes of reinforcement learning this is what it looks like so it's now trying to hide in the corner and dash out to to eat some greens now and then it's it's far from optimal but at least the score is now positive going up and after a couple of hours of training it looks like this so now it's definitely playing this game superhuman we have tried to play it there's no way to play it with this efficiency so that's the simplest possible reinforcement learning game i could come up with and of course my interest in this started when i saw this that this is now more five five years ago uh when deepmind started playing atari games just from the pixel they just looked at the pixel and they got the score and then they learned to play 57 different atari games they didn't play all of them equally well but most of them they played at least nowadays most of them are definitely played better than a human and this is amazing but our games are a lot more complicated a lot has happened in 40 years so how do we go about to actually play in in aaa games where any game i mean any modern 3d game at least is is more complicated so here's an early example of things that can go wrong when we tried to do this over a year back in so this is a very simple the the little guys here running on the road they're supposed to capture the road between the walls and there's one single guy coming up there by the house he's in the opposite team even though it's hard to see here and let's see what happens so what happened uh of course they're they're pretty stupid we didn't have very strong networks when we did this but they also don't have hearing every one of them was looking in one direction they see this low resolution view of the world so what we did was add a small hearing radar the thing you see in the lower left corner that's very crude it's very short range but at least it gives them an indication that someone is behind you and that that helped a lot another cool thing uh that talk speaks about power on your networks is that they had learned using the 3d view division view here so we just slapped this 2d raider on top of the 3d view we didn't do any other changes we just left and it picked it up immediately what it was supposed to do with that radar we had to retrain it of course but nothing else another problem playing a real game is multi-action when we play real games especially if you play a console or with a keyboard and mouse you use a lot of simultaneous actions all of the game playing we have seen this is from the atari paper they play they can only perform one action at a time so in this case you see that the the primitive action so to say are per half and then they have created all possible actions like as new actions combination actions go forward and press the button for example and that's fine when you have an older target or controller it doesn't work today a ps4 controller has around 20 inputs and if you combine 20 all the possible combinations of 20 inputs it's around 2 million combinations so so we can't create all these new actions to for all the possible combinations if the action space is too large unfortunately you can't play once again this is from battlefield we're in the same building as dice in stockholm by the way so that's why we we do a lot of battlefield battlefield you can't play without using a lot of simultaneous actions so we had to solve that problem because if you just start allowing simultaneous actions for the agent it will just button mesh it will press half of them in on average and it takes an enormous long time to learn even the most basic things when you do that so what we did and don't worry i won't go through the details here we we did some imitation learning so we had the agent watch 30 minutes of human gameplay before starting or actually simultaneous with with the reinforcement learning uh and just for the beginning so we decayed the amount of imitation learning it did over time the first hour maybe of training and what this did it helped the agent understand which combinations are valuable with which combinations make sense of the controller so when we did that that's the green line so i won't go into details once again but uh the the higher you get here the more score you get uh the better so so as you can see when we added imitation learning to this multi-action agent it it behaved much much better so let's look at some results how the agent behaved so when we went from atari games we started learning on that horror games we couldn't jump directly into a real game so we built a very small simple um fps game essentially so let's start the movie and look at it so the green guy here is the agent and as you can see he has 12 different actions can perform all of them simultaneously and the goal here is this blue circle to protect the blue circle we'll see here soon yeah that's the objective area so that's what the trade agent is trying to do is trying to find the circles they move around every 30 seconds or every minute and it has also got health pickups and ammunition pickups and of course there are opposing bots he's alone against 10 opposing bots they use classical ai techniques he has a much higher rate of fire though so it's still not impossible so this is what the agent sees this is the only input it has it also has access actually to its health and ammo but otherwise it's just the visual input and this is a low resolution input and you can also see the hearing radar the blue dot in the hearing radar indicates the direction to the to the objective area in case the agent can't see it and another thing that surprised us was the navigation capabilities of the there's no navigation systems here there's no nav meshes or anything but still it navigates this maze of houses to find objective without problem and once it reaches the objective area it has a few behaviors as well it of course starts defending the area and it's also patrolling the area the the hardest thing for it to learn was supplies so when it runs out of ammo now it immediately which did now it immediately prioritizes finding ammo before anything else you can see that it ignores the enemies ignores the objective area runs to the green box and then turns around it also this scanning behavior was also something it uh discovered after a while to search more efficiently so all these behaviors were emergent we didn't say that it should do like this or that we only gave it objective protected area another cool thing about these agents they generalize very well so this is exactly the same agent of course it's a new action space new buttons to push but otherwise it's exactly the same agent so it's a very simple racing game but it only took a few minutes for the agent to learn how to lap this circuit so they are they can solve a lot of different problems of course you have to train them for each problem right now so what about reinforcement learning in aaa games so we have collaborated with dice to try to do this in in battlefield 1. and the case study we set out to do was automated testing battlefield 1 is a very complicated game it's 64 players it's four character classes there's infantry play there's vehicle play there's airplanes there's horses there's zeppelins there's lots of game modes there's lots of maps and this needs to be tested for every new build essentially so this is a nightmare for qa so we thought what if we can help by actually have self learning self learning agents to to play not all 64 uh players but if you fill them up with 50 aliens maybe and have 10 humans or something that would make us be able to scale up testing a lot so before uh going further and showing you how this turned out i we tried actually to do use rendered observations first but the the visuals of battlefield are far too complex for the small vision networks we use we actually use almost the same network that played at our games so it's actually just seeing the difference of uniforms in battlefield was too hard so we have a simplified observation instead so the blue in this observation is obstacles and we use ray casting to find that so we don't render that really but and the the red guys are the enemy and that's in a higher high resolution so it's 12 by 12 for the obstacles it's 128 by 128 for the enemies and we also have this hearing radar 20 minute range the main reward for agents was the score from the game so we're trying to maximize the score but to get them started we also introduced a waypoint so both teams are trying to go to the same waypoint and then that that also helps them meet each other on a large map and supplies we also added they don't have the actions heal and resupply so we added the boxes from the previous games to to battlefield the agents learn uh the last agent we saw that enough was alone against classical ai bots in this case we have the agents learned by self-play so both teams are agents so the second team's brain is an older version of the first team so why don't we use the mode the best brain for for both teams well because of this learning [Music] strange game the only winning move is not to play so this literally happened when we have the same brain for both teams they discovered it i i don't shoot i don't get shot so they stopped they stopped shooting and just went around picking up boxes so so we we actually had to well we froze one of the brains to be an older version so they are not the same and another thing we did was actually introduce a few of the the really stupid testing box we have uh into each side as well but they actually shoot so now the agents have to defend themselves from the beginning so does it work uh yeah well we can look at this clip first [Applause] the enemy is in the lead this is not exactly fine battlefield gameplay uh yeah so they end up doing this when they don't have a obvious way of getting a reward they don't see an enemy they are already close to the waypoint and they have no need to pick up supplies they have nothing else to do so they circle to until they find something to do so let's look at some more successful moments and uh this is programmer art or programmer video recording rather no professional video editor has been involved in this so apologize for the shaky camera movements so it's about a two minute clip and every player you see here is in euro is controlled by neural network i need a magic oh so it works so to my knowledge this is the first time anyone has been able to play a first person immersive modern game with deep reinforcement learning uh so of course there are lots of challenges uh with doing this as well this this isn't easy so one of the biggest problems slow training so what you just saw those agents had trained for six days on and we use eight machines in parallel to to play the game so it amounts to about 15 000 game rounds or 300 days of gameplay if you count all the agents experience so it's definitely slow uh slow going it's also behavioral design you as i talked you you need to you do reward shaping you have to design these rewards to get the agents to do what you want them to do of course in this case it was mostly the battlefield score but it's hard for a game designer to actually go through reward shaping to get the behavior they need from an agent so that's a hard part you don't have a full control you will be surprised by the behavior that can be both good and bad it's very hard to debug a neural network we have discovered and there's also the question how we integrate this with classical ai system like behavior trees because right now all the behavior of the agent is controlled by the neural network i don't think that will be the first thing that happens in real games i think some parts of the behavior will be controlled by neural networks and that means we need to integrate this into current ai systems and of course the execution it's actually not a huge problem but the gpu is busy but doing graphics typically and so we run all the agents run on cpu now but inference that is actually running a trained edge agent is called inference that's much much cheaper than training the agent it's a magnitude cheaper so so right now it's not a huge problem and if some of you have been a few hour rendering talks you might have seen this image already but this is a small project called pikapika it's built up on top of an r d game engine so it's mostly it's called halcyon the game engine so it's mostly for rendering research but while we were building a new game engine why not make sure that it's good at training agents with that is fast and that it has all the mechanisms needed to train agents so these little robots the yellow robots here uh i'll show you a short trailer for for this project and the robots are trying to repair machines and this is a much simpler task than the battlefield task we just saw but this was just right now to get some life into this rendered environment so let's have a look [Music] so [Music] um [Music] so in this environment we get it's fast we get a lot of different rendering modes that the agent can use it has very fast communication with the brains that those are typically written in python the game itself is in c plus so we'll continue making sure that this engine has good support for for self-learning and because speed is very important the more data you can get when you're training the more the faster you can learn and in this case we see 36 agents training in the same process and we can actually run a few of these processes on one machine so we can get a lot of machines training in parallel so the hype i promise to talk a bit about the hype and one sign of the hype is that i counted to more than 20 talks at gdc this year that have to do with deep learning or machine learning and i think that's a record and of course much of the hype is about artificial general intelligence and that's not really what we are doing but let's talk for a short while about it so that is when when when a computer is good as good as a human is on most tasks so will there be artificially intelligence i think that's quite easy to answer if you believe two things technological progress will continue and intelligence is not magic it is biologically based it's physics it's not supernatural in some way then yes of course technological progress might not continue there might be the third world war or something but if these two things happen then we will eventually have artificial general intelligence according to my belief at least so then the question is when so a couple of years ago there was a questionnaire at the conference and a lot of experts guessed when we would have agi and the median i say average here but it is median the median guess was around somewhere in the 2040s so a little bit more than 20 years away of course this is a wild guess it's very hard to make these kinds of predictions of new technology so for example in 1902 the wright brothers said that it would take 50 years before humans would fly in 1903 they flew in 2015 people believed it would take another 10 or 15 years at least before a computer program would be a go master in 2016 alphago did it but of course we still don't have those flying cars that we were promised in the 50s so it's very hard to make this kind of prediction but let's go with 20 years away for now and the hype this doesn't necessarily have necessarily have to be artificial gender intelligence but there's some negative constant there's lots of accelerations going around not that a i will solve every problem it probably won't not for a long time because it's still very hard to do and every startup is now an ai startup or a blockchain startup or or even the best is an ai blockchain startup so the label ai is very overused every feature is suddenly an arv i saw a thermostat that could lower temperature in your house during night that was an ai thermostat another problem not noticeable maybe if you're not in academia but it works in practice but not in theory we really don't know why deep learning works as well as it does it shouldn't work this well and uh we don't have the theory for it yet which means that building a neural network architecture and tuning the hyper parameters and everything is more of an art than a science right now it's not a huge problem practice because if it works it works but if we had the theory it would of course be much easier to do things then we could actually calculate what the network architecture should be instead of guessing and trying yes the ai winters there's also a lot of naysayers that we will soon have a new winter we had winters during the 80s and 90s where we reached a plateau nothing we could solve toy problems but but nothing really interesting and of course we might end up there again but i'm more hopeful this time because yeah i'll come back to that at the end of the talk um we have taken a big step up now and can solve a lot more than toy problems and of course there's people that think that ai should be banned i saw that one union here in the u.s wants to ban all deliveries with automated trucks and the drones for example so this is already a debate that i started so let's go back and get back to narrow ai not the general ai so deep reinforcement learning is hard and there's been some great blog posts recently uh about sound like this one the reinforcement learning doesn't work yet and they bring up a lot of great points that are not unsolved really and it only works well for games and simulations they say and that's very lucky for us uh because we're in of course in gaming so one problem is the reward shaping i mentioned before so this is work from open ai and the boat are supposed to go around the track it's not doing that it's going in the wrong direction but it's still winning the game because it has found these green things that it gets a lot of score for you can see on the small radar to rapper left the other players going around like they're supposed to do but this boat actually wins the game so it's because it has much higher score than the other ones when the game ends so it's fine to an exploit and that's actually something these are agents are very useful for finding exploits in games they found bugs and old exploits that were unknown for 40 years in that horror uh games this is also a problem because reward shaping as i said is hard we don't want to have lots of specific rewards to get a certain behavior out of the it's it's too hard to do that it takes a long time testing so how do we try to solve that so in this case the robot this is from a recent paper by deepmind and the robot here is supposed to first staple the blocks and then clean up and put them in a box opening the lid of the box and this is a hard problem because usually you have to get a lot of small rewards here like first try to move the arm towards the block then grip the block then open the lid then grip the block again open or and drop the block and then then you're finished and every one of these sub tasks consists of maybe 100 actions small actions i mean motor actions to to to actually achieve this so what they have done here is hierarchical reinforcement learning it actually learns to use these higher level actions instead so for example move to block or grip those consists of a lot of small motor actions but it has learned to use a sequence of those higher level actions instead and then the problem becomes much simpler then we can simply give it a reward once it has cleaned up then close the box you get one point in reward you get no other rewards and that's a very hard problem but i think we have to solve that problem to to make this feasible or at least much simpler than it is today another problem is of course if you saw this curve if you look at the horizontal axis that's tens of millions of steps it takes a long time like i said so sample efficiency this is called we we don't use the data efficiently enough so yeah so they are very slow to learn but and why is that so if you take a newborn kitten as an example they are also they are not great at playing games because they are born blind they don't know how to move really they they know nothing about gravity and they until they fall a couple of times and so on so they have to learn everything from scratch and that's also what our agent does every time we train it it starts from scratch it has to relearn its visual system and so on so it it's take it takes a long time and that's not strange so what we need to do is transfer learning we need to be able to take an agent that already has learned a few things and continue learning but that's also a hard problem yeah so this this would be slow uh so there was actually recently an experiment on human priors so to the right here you have a simple platformer and there's some so much we take for granted that you can climb the mir uh ladder just that you can stand on the gray stuff and that you should avoid the pointy stuff and the monsters and so on and if you see her up to the right there's a key and we know that the key is usually used in a door so you know the player doesn't have a problem opening door to the left is what the computer sees or an approximation of what the computer sees and they they let human play like this and then suddenly the agent wasn't slower than the human learning you can imagine playing the game to the left it's a lot harder and that's because we have priors we have a lot of knowledge about the world so one trend almost or at least a few recent papers have been starting to talking about this we have to have our agents uh play so playing to build the model of the world so not try to solve the direct task that they are they eventually we want them to solve have them play around in the world instead and let them learn a model of the world what happens so they can predict what happens when they do an actions another criticism against deep reinforcement learning is like i said we have to retrain it when uh when it wants to learn something new such a one-trick pony uh but then it and that's mostly true until a couple of weeks ago when once again deepmind released a paper where they actually have one agent playing 30 games without retraining it's the same agent playing all 30 games so i it doesn't necessarily necessarily have to be a one-trick pony anymore and finally i want to talk more about alphago so go is a much harder game for a computer to play than chess chess was beaten in 97 when casparo lost to deep blue but that was completely hand coded deep blue alphago of course used machine learning and it used machine learning first to learn learn the game it looked at hundreds of thousands of human games from archives to to be able to pick up something to start with and then it started using reinforcement learning to to become better and in 2016 march 2016 it beat lisa dole one of the best players in the world uh ford won and of course that's a huge success but alphago was pretty pretty complicated it had to use imitation learning to look at learn from humans first it used an enormous amount of hardware and if we look at what has happened since then two years ago we've had a few versions of alphago so alphago master is just a continuation of alpha lee and that managed to beat 60 masters undefeated so that's that's a much better version but then the really amazing thing is alphago zero alphago zero is a much simpler version of alphago it uses no imitation in learning it learns completely from scratch it uses much less hardware the network architecture is much simpler and still let's see well before we we look at the diagram let's talk a bit about the elo rating so the best human is rated uh a bit below 3700. if you have 400 more than your opponent then you have a 91 chance to win and so on 899 and 1500 uh one in 10 000 almost that you will lose so here is alphago zero's performance so this is training in days that you see on the horizontal axis and the green line is alphago li version and remember this is using much less hardware it's a much simpler algorithm a much simpler network model so when the line stops here it's around five to five thousand two hundred or something in in elo and that's 1500 above the best human and that's one in 10 000 to to get beaten so this is truly superhuman performance and the other thing after that there came a version called alpha 0 only because that can play many different games because in this simple system all you have to do is change the rules nothing else so they insert the chess rules instead of go rules and of course it became the best chess playing program in the world as well and a cool thing from this paper was that it discovers these standard openings such as and you can see the hours of training on the horizontal axis how it discovers an opening and then when it becomes more and more experienced it abandons some of the openings so it's too bad of those of us that are playing horror con defense still you shouldn't according to alpha zero uh so but one of the most amazing things that's not visible on this graph i just showed you is this this is the best hand crafted go program it's around 2000 any low rating so within two days alphago beat decades of software engineering and 1000 years of go experience with a simple learning algorithm and this is one of the important points here with learning we can do things that we cannot possibly we are not good enough to program this ourselves so the deep learning is still hard it's still simpler to solve many problems with the conventional methods rather than dl so why do it well i've been in software engineering for a long time and this is definitely the largest boost to capabilities of computers that i've ever seen we can now do things that were previously impossible in computer vision or post estimation or some of the other examples i showed you or just learning to play on one of these complex games was also impossible just a couple of years ago and that learning methods can quickly outperform decades of software engineering effort we can how we can try very very hard to solve a complex problem but it's becoming more and more probable that soon some machine learning method will be able to solve it much better than you can as a programmer so many difficult challenges remain but the future potential of this is enormous and as i said in the beginning about ai winter we are just starting to learn what we can do with deep learning there's so much left as i said it's more an art than a science still and we're still in the early days of deep learning and that's why i'm very hopeful thank you questions hey so you mentioned somewhere in the middle of the talk that we don't actually have a formal understanding of why deep learning works as well as it does yeah so actually tuning the parameters as much of an art form as actually like having something that's formalized behind that i'm wondering if you have any anecdotal well evidence of how did you guys actually triage how you should change parameters between different um iterations of the game or of like any sort of problem that you're actually in because theoretically people can make different arguments as to whether you should keep going whether you should try to change parameters etc yeah so the answer to that is we didn't triage that much because we didn't have much hardware to do many parallel trials so we went with intuition and had some luck but it's i'm certain that this agent i've shown you today they are in no way optimally tuned they could become much better if we have if we put more resources towards it or get better theory uh uh behind why they work as they do so yeah we know we didn't do much hype tuning yes hi um very impressive presentation thank you very much um i was just curious about you know you were talking about the network architecture and it's very hard to tune and it seems like it's one of those intractable problems that machine learning is actually very good at solving because you know we don't really understand how machine learning works but it does work we don't really know how the parameters work but it's great at finding optimal parameters to do something so do you think there's some sort of future for having machine learning to tune machine learning yes uh well i don't know but there is this this the topic of meta learning where you actually try to learn to learn and you have another neural network controlling the learning algorithm itself so i guess that's one route towards what you just described but otherwise i don't know thank you hi um i'm curious if there's a reason why you chose uh vision as the primary input for uh for your algorithms as opposed to more traditional knowledge representation of of ai in the world i mean obviously you need some some element of vision but things like ray tracing things like uh you know things that you would give a traditional ai as a knowledge in order to reason why why vision was the input instead of that one reason was like because that's how the algorithms were put together when we started i mean the atari examples were they were vision but as i said in the battlefield example we couldn't use vision anymore because the vision was it took too long to train and it was too complex so we actually used what you said ray tracing in the world instead and of course we could use some kind of internal game state representation as well we don't have to use anything visual at all but the problem with using internal game state you have to be very careful to not give the agent more information than it should have because the game state of course is a perfect information everything in the game you only want the partial information that uh that a player would have to get it to play like a player so that's also one reason to use these simple vision-like observations thank you i was wondering if your team does any work in using deep learning to analyze analytics data from online games or is that outside your scope it is outside my scope but we definitely do it within ea yes thanks if there's no one else okay one more um did you do most of your experiments with like very simple uh single hidden layers or did you do a lot of experiments with different graph formations you mean the the network model itself correct we didn't do much experimenting we used essentially a slightly modified atari network with some convolutional networks for the vision system but we did add an lstm the lstm version to get some time understanding and time sequences but that's about it thank you so the question was have we considered using internal game state to speed up the training and yes we have considered it we haven't done it yet but yes trying to filter the game states to actually be this approximately the same information that the player has because the problem is you have to start to do a lot of occlusion and other things to hide what and that becomes expensive it's actually much cheaper to give the agent all the information but then of course it becomes omniscient and superhuman and that's no fun okay thank you
Info
Channel: GDC
Views: 15,745
Rating: 4.9270072 out of 5
Keywords: gdc, talk, panel, game, games, gaming, development, hd, design, deep learning
Id: yA-lJy52Ais
Channel Id: undefined
Length: 55min 6sec (3306 seconds)
Published: Tue Jul 28 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.