Demis Hassabis, DeepMind - Learning From First Principles - Artificial Intelligence NIPS2017

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

Watch out, the loud clapping at the beginning destroyed my ears.

👍︎︎ 8 👤︎︎ u/twistor9 📅︎︎ Jan 25 2018 🗫︎ replies
Captions
[Applause] so I'm going to talk today about learning from first principles and they covered quite a few different things and it's a little bit about zero latest work at the a-list so ambition is to firstly try to solve other if you solve intelligence and then it's a belief that if you're able to do that you should be able to use it to pretty much solve everything else or at least it should affect us every aspects of our lives and when we start to people and back in 2010 now we had this sort of philosophy behind our approach how are we gonna approach this will appease delicious dish and really States intelligence another number of climate issues and just going to mention for the key axes that we we use to sort of inform our philosophy about how we would like to approach building agenda intelligence so for us the most important is committing to this idea of learning versus handcrafting solutions so we wanted to build systems that learn for themselves request patients directly from the more data rather than being spoon-fed and contrasted with solutions that the program is the service economy secondly there's the notion of generality so you wanted system that was able to work across a wide range of environments and tasks and potentially out-of-the-box so including potentially normal person is never seen before and that's versus specifically preparing profiteer system to do it just with one one of us third was this notion of groundedness versus logic based systems so with no believers that are true thinking system auction cognitive system needs to be really grounded in essentially relative reality so you need to be able to trace the origins of the knowledge that is created all the way back to the wall basic English rather than in contrast logic based systems which are sort of these ethereal systems where this is one of the object them floating floating kind of unconnected to the actual Native American religions living is this notion of active versus passive learning so systems or systems are more encompassing so for example the classic point gets an input and then it outputs a decision about that interest but when we were off there East Asia based systems that are active participants in their own learning much like Alison was mentioning earlier about somehow children any vehicles as this you were mentioning about hi and was done to so these are so important when thinking about and we committed to be rethought the right approach was to be in the left I'm extreme here although these four axes and so taken together that night gives you the idea of the system that's a general-purpose learning agents and that's what we internally would eat more artificial general intelligence that's what we're trying to deceive now in order to do this we actually look to neuroscience assistant neuroscience and cognitive psychology ideas including John Fetterman and animal intelligence restorations or algorithmic I gave her there's an architectural ideas that we can borrow and build our systems around first attendant said over this type of system was a system called decent rent and we may see put together neat learning and reinforcement learning at scale to form deeply forces only which is a subjective now call it and to try and do something impressive with it in terms of government how far you can push the system so you go an internet learning system and then it's regularly agents that just takes guts and rewards signal comes up on all the score for the game and then has to figure everything else of itself we use them games and simulations as efficient actions are testing and development design our algorithms because we could run millions of versions of these at once in fact versus something like that easy build robots and I first think that what results as many of you nervous applying this to Atari games a classic Atari games with JT's I'm coming Billy one system that could master all the thousands of thousands of different very different Atari games so we trained our primary network system neatly which we called ether n DQ network and basically is a very large convolutional net that uses the remote signal to to modify inside these points and that successfully plays maybe these dozens of Atari games I already spent very long in this works I wanted to understand rest of time talking about next pieces I work out for them is our system to play game of Go so those of you don't know this is what the drawing board looks like it's played on the night of my ninety board I take turns to play studies on earth seasonable the name of any NGO is to surround your opponent's pieces with no pieces or to solve anything errors of territory when you get to the end of the game so this is their position at the end of the game you may see can top the amount of territory they used to mounted and compare it to opponent's territory there it has the most points once again so in this case likely inspired just a single point is very close game now go is extremely simple game to learn there's only two rules really and it's a pretty elegant because progressing from that simplicity comes incredible complexity and one wave easily generate that is the fact that there are 10 to power - 70 possible or decisions below which is waiting that there are accidents observable universe so so that's one of the reasons why I don't think so far the computers to play obviously the late nineties IBM this deeply famously me Garry Kasparov at chess and since then you know people that next time he gives up trying to apply those same techniques to the game of God but really they were never able to get further than strong answer naturally let alone kind of world champion so one of these problems is this alumina search base so this 10 to the power of the second possible positions but actually any more interesting bit harder problem is the problem of try to create an evaluation function so it was thought impossible in fact it may still be possible to directly write down an evaluation function to the government that will allow the computer system to determine whether the partners issue that's always very critical to the sorts of traditional techniques I have Ibiza so there is less significant go and our contractors at the Emirates on our chest is that there's no concept of materiality all the pieces are the same so there's no easy turistic to tell you who's coming the winning another problem we've got is that there is why doing a constructed game or if chess is a disruptive game what I mean by that is that chest using chess he starts off of all the pieces on the board and that's a game with a lot of pieces of the move so the game get similar he's going the both start 17 and you fill it up so you know if you're in a midday position by the chest as though in chess and all the information is there and what's the point of even the current board position but you know all those who had the information gaming you also have to protect for some time and think about what might happen and in order to figure out who's going to get what's here actually it's a little obvious from the current position the sort of construction to go last what kind of humans deal with this definitely lets the end professional don't pairs how do they do that what if he talked a lot go players they'll tell you that the game is primarily one about intuition that's what we do they rely it sneaks in intuition rather than calculation which is what Iron Chefs I was living there explicitly calculate out particular plans or ESCO is much more about feel this move feels right in this particular situation so we thought this is a great challenge for us and our learning systems we know they're the traditional blue force systems work we go and so that's why we chose go as as a good target for us for the types of learning systems he wanted to do so we felt with these two problems by easy to do new hand works so the first day we network who call the policy Network here or there the green we train it on a hundred thousand to several hundred thousand human names human and it's against that we downloaded than the internet and what we did in the policy now is to trend it to predict what the mood was indisputable states plan and I got with leaders like this to the extent that if you give it any random or position and here include then it will output a probability distribution of how likely it thinks humans going to play certain moves in the current position so that means you can basically look at the top sort of three or five most likely news rather than looking at all the 200 possible movies there are on average in a particular position so this narrows down the width of the search but essentially abrasive reported entry meaning I would say or as we trained the second neural network on the data produced by this class network pack against itself um many minutes of times and we took positions from those games and we also use which ended up losing ultimately and we trade the second thing you'll never have the value networks they're in pink to predict from the current position who is likely the winner and what was this sort of confidence level over that so the random and returns another real number between zero and one zero meaning white very encompassing complex and we won from singleton classically and point five their company basis and again we to discuss virtual in Africa and we put it together once in college research is very important component the Oracle system which called these to do networks and then the each others don't run of the legends of the game is going south koenji is for Lisa go he's a legend from the game which has eighteen well tighten season think that was like kind of science and I caught in there whatever have done because eventually that's the same some extra ones needs to occupy Sinise he's ignorant apprentice cleric across decades we had this one million dollar challenge match our immunity is done out in South Korea and you know it was a huge event and and really kind of an amazing spirits but once the larvae experience and over too many people watch the matches online little TV at several surprise between these dolls and alphago one-for-one and this is sort of the claims being about that came before his time so when I spoke to several of the world experts in computer go the company is before we go out for go they were all convicting them and in their interviews there were at least 10 years right or the use of the current methods a very amusing traditional methods to to be able to text me anywhere close to while champion so this is pretty surprising for everyone but also that what was more interesting also just about that one but Paragon what kinds of news is amazing what some ideas about so because it was basically trade from this notion of self play it actually came up with new ideas and new motifs that be discovered through through that song play denisa one famous example from the game to do this is a second game after that there is black and there's a second alphago played this stone here on the right hand side we can see indicated that and rid of the water with the white triangle and this was virtually shocking for the commentators viewing this Maximus several climbing YouTube clips into their cat where they literally fall off their chairs and the reason is because this move is being played on the fifth line so there's 19 by 19 is pause and this is we play souls on the fifth line now in the opening set or parts of the game you generally only play on the third and fourth highest then playing on the fifth body gives away a richly on the side of the board from the fourth line which is quite a lot of territory so it's quite a lot points at the end of the day so it's basically inconceivable that's affirmative like this and if you are an amateur and you play like this will teach you to quickly correct you so this is an original kind of unthinkable move but yet though no reason in China considered to be sort of part for me and in Asia it simply kind of an awful but it's interesting because it's like objective part so you know maybe one of us in this order to come up with a novel wound by playing around and within this position but the key thing is whether it's a sort of musical movement considered to be branding is whether actually affected the outcome and real world and that's why this truth examine is considered so beautiful it was a hundred days later these two stones here in the bottom left-hand corner i granted white radicals which are in a bit of trouble because they're surrounded by white stones that fights into the bottom left-hand corner cut that reflects a swirl damage to the end of the board and perfectly connects up with move 37 back stone and that's the only position perfectly to decide that quite a slice of me and that's why in south america very split again so it's elicited his idly it sort of design me ahead of time there are few hundred a few thousand rooms later that stone would be in the critical position even though it looks like a bad move locally of course i should mention that weeks ago himself was inspired to doing his own genius move on who's that waiting game for haven't got time to uncover where that is but there is actually a documentary now which is directed by some already better brilliant guy called red coast and now it's they bore all sorts of students of isis i recommend you can check that out if you want to know more so I've talked to you a bit about nutrition and a little bit about creativity but what do I mean by those terms well I mean just I'm just happy that this vocalizing this encompasses all the all the complexity for these times in but I think these are these reasonable operational definitions we can think about so I think there is an intuition that simply implicit knowledge that we have must be required from experience of course but it's not consciously expressible or accessible so we can't access it consciously ourselves and we certainly can't express it to anyone else the second thing is you know you might think is what only few visits that kind of knowledge how do we know that it's there but of course we can test it behaviorally so we can verify the existence and as long as the quality of it howaya testing their behavior so Kagura this is very easy I mean if the entity normal person at a deposition and we see what you can evaluate the quality of the decision that they make secondly what about creativity so again there's many dimensions of creativity but I think one reason to operational definition is the ability to synthesize knowledge that you already have in a new way of producing local origin idea the service of some kind of God and I think surveillance only it constrains main ago I think alphago clearly demonstrated these abilities and most professional players who agreed so we continued with the development of alphago because our mission was not to just create a program that was going to go but we wanted to make greatest of all society of a general 90 now and so the way I approach is need one is to try and make something work quite an internal way as possible and then look at it in mathematics after that speech the supplements and to try and make it extract little bit simplify it eventually makes it more water so that's what we did we offered a zero which is which is the latest it's a required a question of Africa and we wanted to repeat this part of the bootstrapping for human later because often you don't have anything the data and it's also specific to that particular thing so we started in this case complete from random play with zero knowledge and I have half an hour zero players of Socrates applies learning collectively from its mistakes and so what we ended up with was an English stronger more efficient and more general rushing around for them so we change the architecture a little bit we created a julienne architecture so Africa's there are only has one neural network which likes to take two things so the policy now ready met it isn't one we will network and we think this is one of the reasons why are those zeros don't them because this results in better organization and then it's actually quite simple algorithm that so after that place you know about 25,000 gamers in one batch against itself and the current full strength that it has then we take all that new data and we train a new policy network to predict that version of alphago is loose what that person would do we're also training in value network that there are that predicts the winners from those games we gamers issues then we put those two together into a new iteration about and go astray automatically and then we pitch that a new Russian plays that are fueling the don't question and if it winds up 55% or more significantly better then they replaces the old question as as the new now generator of the next level basil if it's not significant better than continuing with the other question they collect little decks up and they try to train again after another 25,000 years so this is the the curve of improvement performances on the y-axis days time is on the x-axis and you can see they're just after three days alpha go after their zero and it's a passenger version of Africa that we played against this is our so it's a really well tracking they will kill my tracker never after three days after the 21 things it needs the best version of alphago that we are created for after their master which $27 young top professional players and then you'll notice after 40 days we stopped writing the program but it still has an asymptote so we never actually found yes and so got this self improvement process this lots of something kind of pretty crazy like 5,000 meter but it's still improving so it's as it goes actually what the final optimal play and acetone be and then we went back and look through the different plays that the machine was doing month ago Sarah was playing at different time scales and the fan that you know the guillotine expect is playing on the total diameter over the concentrating stone explaining the stupid places like the corner on the wall and then over time it starts rediscovering the best luxuries so after 36 hours 40 hours it's not playing quite common what's called a exactions which is the best light he was about and then the the frequency of those plays don't propagate as a testable 72 hours and we think that's because our girls decide that there are actually better flexible approaches and then intersections practice starts effects in its own ones and these ones there are being analyzed and utilized by human players or professional dragon world so in rainy the thing called after the zeros we discover 3,000 years to the knowledge that goes around for 3,000 years in sensitive past so you can be more details about this is there are two papers that we published last June this year and then finally I want to talk a little bit about our registration and artha zero which is very important so now we won't go from this and it's just alpha zero a generic page playing and you know some people steps forward I think about even after those zero that could play other even other games like chess and show me so we appreciate could but we decided to actually test the species out and we we leaves officer oh and we tighten it on line three the three major personal information next chess champion go show these economies chatter straight pancake it is something that we don't require let's hug to talk about that missile so I'm assuming focus on alpha zero today in Chester which is the game I'll invest well when we start to us off I talked to a few of my friends from the chess world needs and explore champions and they might not as a question that's how far off do they think from oxygen plane were the current chess engines so as you all know HSN is as far beyond human world champion ever Bobby already divorced and they've got really neither so the three thousand or appeal and the cut unbeatable right heel index and as actually the Chester myself I think that they're probably pretty close lots of already one or two yeah she went to be very hands-on thoughtful and insert it's not it wasn't clear that a learning system could be as good as his hand crafted systems given the amount of effort that gone into building systems like they lost twenty thirty years that the news desk justice white probably the most one of the most if not being most studied American history of AI but so insulin is further computer chess is winning at the birth and the order of computer science so people like children Shannon advantage we're thinking about computer chess or in the early days of computer science and they are all kind of current recipients of highly specialized systems that have being obviously very successful in playing chess and they use African to searching a whole bag of handcrafted Jewishness that it's distilled form human grandmasters so communicator stockfish which was the program on the 2016 World Championship this is the list of special case jurisdiction extensions that their stock fish has so gigantic difficult you know masters specific types of techniques to help it play chess and moves to help loquacious there's no that we just replace all of that in hand engineering with self claiming for some money itself payment upon research so there's no math opening booking alpha zero there's no ingame deck space no heuristics at any point it just starts him Heaney for random player and when I wasn't as sure as chess player this is work give it an important openings and their games are and you know sawfish plays a permanent exhibition plays as given it's nearly perfect but the only thing was he wanted though obviously again than just a motion to totality say that we would say my parameters for all three games and much to our surprise that you know was the first time you try to run this that we found that alpha zero be state of the art chess programs in all three games so in chess the big slot fish 28-0 didn't is one game at 132 thoughts who needs a little bit black and shogi it needs a world champion the alwah learn to paint and then you know we beat alphas urban Africa zero and 1640 after three days and compare Agartha three days ago so alpha zero thickness of our stock issues you know our strengths need four hours they'll come in two hours and alphago all the graphically the visit operation in eight hours so this whole thing all three games in less than 24 hours don't thing is location while it's a question of compute power we we honestly don't think it is I mean we use Google sensor processing units all of them in the cloud but that's quite a bit amount of power but actually one way to measure this easy way to measure this is how many lose I searched her decision made for zero and the human grandmasters you know they search about 10 minutes or tens of lose her decision they may say ok Archer centers like starfish has such tens of millions of who's in each decision they make so selfish about sentencing in in movies type a decision and alphago 0 is why in the middle of the spectrum in its using about tens factors Booth's her decision so it's right in the middle so c1 is much less than than those they've got jet engines obviously it's still three almost magnitude more than humans do but it's it's moving towards the right direction or in terms of water the right we did a quick analysis we'll get into more houses and that whole paper as a personal ability like white paper is we looked at the mechanism the discovered by alpha 0 and I'm pleased to say that my favorite in the English opening a lot that survived this is a eight hours of training whereas surprisingly some really popular openings like in the bottleneck say the Kings Indian defense was not really favored by an officer at all at any stage and if you leave out the zero on a style it's not what I would say doesn't play like a human they also have to play like compute engines it's any placement first almost alien brain or an officer announcer we could analyze this is a bit more but with have a few chests but also look at this or including one on our team and our concern as long-term positional sacrifices and that isn't really the thing checks because sacrifices are most mean about tactics so it was a crisis you maybe give the Queen way because he thinks you calculate now when we can check make your opponent a few news later so these are tactical sacrifices they're quite unusual but they're they're playing in this playlist you know world channels like kneecaps help who's famously doing those kinds of sacrifices or where alpha zero does is it makes sacrifices for some very long-term an additional game which you know I was surprised that chest have the capacity to allow for things like long-term positional sacrifices so it appears that this is an actual NHS and I think the reason more information we'll take on that is that because it doesn't have a concept the materiality so have a rule telling it that groups of five points the last of three points holes at one point it doesn't know anything about that so so it's been about everything is kind of contextual it's Lou things like this kind of position you know news assets I have this world that can move like this isn't that useful whereas did my opponents all tonight is really powerful this division so I'm gonna swallow right and it doesn't have to very become a summation more than half of this system that tells me that that it's gonna do to you points by sorting the look for night now obviously a stop nation there's kind of Champions champions a crisis but we have to calculate enough to see some end point that they will get that ethereal rattle there'll be some other compensation for it whereas that the zero doesn't need to do like any patient about retirement in that material at some point now right so the key thing here is that Richard Mills like that there's the traditional Chessington time can't be dynamically balanced on situation right you can't just say it's just have another if then we'll say in this specific transition this book is not worth my points select three points it's always worth five points in its hard-coded negative X's so the initial is actually more for more adjustment enemies though it may not conceptually chatters in two minutes this can be very exciting to learn of the chest and I think humans can learn from mr. because these ideas to change it rather than tactical so we can we can actually walk back into powering player I just want to near the end I'm talking about show you a couple of virtually no chefs and as for the damn I love games so so there's quite a lot game theory in this in his talk but his position on whether show showing these kinds of long-term sacrifices so alpha alpha 0 is whitehead and basically what is done is sacrifice efficient you can see it has taken a call and it's going to give away its bishops and are sufficient that institutions and seen the black Bishop there the top left corner concern and but it's a completely calm Aramis and I think what's happened is it's decided that that Bishop there the black house is hemmed in by the corners isn't worth very much so it's definitely worth three points so it's companies even a maggots Queen to be sworn off this is kind of my burden off like if your bishop that Miss Reno right now it's just gone I mean so you've got one home for it so you know he does not become your thing and eventually will just really like soft issue someone crashes around this visual be completely and so an officially you know as much as its King it is number one some lettuce once it midday life sacrifice since his middle name analyst alpha zero just played this night move to the corner the voltage check the king and it Masek you thought this night incredibly just for those small fish to take night only and it just can't be don't develop such pieces and this is pretty many people showed up and I showed that one of the grandmasters this which couldn't believe it so it takes point yet and then it moves is Queen in other times it's ready to have a meal incredibly this one in place that's right back with the bishop its have that physical Portland on seasons right so you know I just think you know there's a lot more here to explore I just see this is kind of chess with another dimensional and Derek has far from self actually before we saw gamma 0 results he was talking about deep bloomers and anything some kind of error and you fell after that because it's a learning system equals it's high please explain the type a system is some kind of beginning so it's on ahead we're talking and I know like Gary matsutake later on to create some of his comments imitations obviously with this system you know where I think this intelligence service to a communist refused comment all space so what you have to have is a clear objective function and you have to have even large amounts of data or efficient simulator in this case to generate great things out and this course decisions the relax along things alpha 0 it can't generate hypotheses or design and it hasn't only player abstract knowledge or causal reasoning creditors so we receive all those things but I don't want to say a capital back few years ago in clean much everyone in this room would have thought that these results are kind of unthinkable and assistant Israeli general you know he's the the genuine games clinician and I always dreamed about an electronegative elements on my committed and we have it now the piece of information yes obviously we're not claiming this is all about innovation for missile is a huge challenges its brain learning the world already lost the confirmation uncertainty there's precious efficiency if you want to talk about that one when I remove these things that we're working all those projects and we think that they're very altered clearly formed Czech intelligence but there's one kind of a human efficacy XY dimension which is that obviously humans it's not quite right to say that let us take a from those 0 and had 13 games and we worked out the summer-like piece of the jalapeno after 2000 games in their lifetime and that seems like a huge discrepancy but don't forget the humans are coming to this which we produce early partners already and also want to transfer learning and more westerly apprenticeship right so obviously that reaches all three going on switch their better selves destinations of many other humans knowledged on the game and they didn't talk where they're going masters so I think in the future in the cottage things actually test you could actually make you think about information test on the book in the sense of you could have one note or a question of alphago zero that that learners cheerleaders okay and another version that once it's able to read books the reason book and inner self right and then you can kind of compare how many games of self babe does that look worth so kind of before sometime we were able to do that and what our off the reason I think and discuss a lot with my college series though you know the minimum set of massively general parts is what we're after and obviously used to end you know we're not we're not looking to just play games in our systems that's just our proving ground on a testing ground and albeit a fun one and really what we're trying to do is acquire these like this you know one of the greatest artists of doing that in areas of health care to indigenized actually makes sense and all sorts of other things and for me the most exciting thing we to try and acquire some areas of science another it's not quite limited in ways on this pension I think there are really areas of science that may require me to go to this type of approach by drug design material design and some aspects of biotech and I think ultimately this is the area I'm looking forward to as I welcome any right is the power of humans machines together you know and humans and us and our smart tools figuring out some complex scientific questions I think human ingenuity operates by ionic all I'm not actually the that potential
Info
Channel: The Artificial Intelligence Channel
Views: 24,948
Rating: 4.8207545 out of 5
Keywords: singularity, ai, artificial intelligence, deep learning, machine learning, deepmind, robots, robotics, self-driving cars, driverless cars, Demis Hassabis, neuroscience, alphago, lee sedol
Id: DXNqYSNvnjA
Channel Id: undefined
Length: 35min 40sec (2140 seconds)
Published: Wed Jan 24 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.