Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition

have you reflected a lot on how to select Talent or has that mostly been like intuitive to you Ilia just shows up and you're like this is a clever guy let's let's work together or have you thought a lot about that can we are we recording should we should we roll This yeah let's roll this okay we're good yeah yeah okay s is working so I remember when I first got to K melon from England in England at a Research Unit it would get to be 6:00 and you'd all go for a drink in the pub um at Caril melon I remember after I've been there a few weeks it was Saturday night I didn't have any friends yet and I didn't know what to do so I decided I'd go into the lab and do some programming because I had a list machine and you couldn't program it from home so I went into the lab at about 9:00 on a Saturday night and it was swarming all the students were there and they were all there because what they were working on was the future they all believed that what they did next was going to change the course of computer science and it was just so different from England and so that was very refreshing take me back to the very beginning Jeff at Cambridge uh trying to understand the brain uh what was that like it was very disappointing so I did physiology and in the summer term they were going to teach us how the brain worked and it all they taught us was how neurons conduct action potentials which is very interesting but it doesn't tell you how the brain works so that was extremely disappointing I switched to philosophy then I thought maybe they'd tell us how the mind worked um that was very disappointing I eventually ended up going to Edinburgh to do Ai and that was more interesting at least you could simulate things so you could test out theories and did you remember what intrigued you about AI was it a paper was it any particular person that exposed you to those ideas I guess it was a book I read by Donald Hebb that influenced me a lot um he was very interested in how you learn the connection strengths in neural Nets I also read a book by John Fon noyman early on um who was very interested in how the brain computes and how it's different from normal computers and did you get that conviction that this ideas would work out at at that point or what would was your intuition back at the Edinburgh days it seemed to me there has to be a way that the brain learns and it's clearly not by having all sorts of things programmed into it and then using logical rules of inference that just seemed to me crazy from the outset um so we had to figure out how the brain learned to modify Connections in a neural net so that it could do complicated things and Fon Norman believed that churing believed that so Forman and churing were both pretty good at logic but they didn't believe in this logical approach and what was your split between studying the ideas from from neuroscience and just doing what seemed to be good algorithms for for AI how much inspiration did you take early on so I never did that much study of Neuroscience I was always inspired by what I'd learned about how the brain works that there's a bunch of neurons they perform relatively simple operations they're nonlinear um but they collect inputs they wait them and then they an output that depends on that weighted input and the question is how do you change those weights to make the whole thing do something good it seems like a fairly simple question what collaborations do you remember from from that time the main collaboration I had at Carnegie melon was with someone who wasn't at carnegy melon I was interacting a lot with Terry sinowski who was in Baltimore at John's Hopkins and about once a month either he would drive to Pittsburg or I drive to Baltimore it's 250 miles away and we would spend a weekend together working on boltimore machines that was a wonderful collaboration we were both convinced it was how the brain worked that was the most exciting research I've ever done and a lot of technical results came out that were very interesting but I think it's not how the brain works um I also had a very good collaboration with um Peter Brown who was a very good statistician and he worked on speech recognition at IBM and then he came as a more mature student to kind melon just to get a PhD um but he already knew a lot he taught me a lot about spee and he in fact taught me about hidden Markov models I think I learn more from him than he learned from me that's the kind of student you want and when he Tau me about hidden Markov models I was doing back propop with hidden layers only they weren't called hidden layers then and I decided that name they use in Hidden Markov models is a great name for variables that you don't know what they're up to um and so that's where the name hidden in neur NS came from me and P decided that was a great name for the hidden hidden L and your all Nets um but I learned a lot from Peter about speech take us back to when Ilia showed up at your at your office I was in my office I probably on a Sunday um and I was programming I think and there was a knock on the door not just any knock but it won't cutter it's sort of an urgent knock so I went and answer to the door and this was this young student there and he said he was cooking Fries over the summer but he'd rather be working in my lab and so I said well why don't you make an appointment and we'll talk and so Ilia said how about now and that sort of was Ila's character so we talked for a bit and I gave him a paper to read which was the nature paper on back propagation and we made another meeting for a week later and he came back and he said I didn't understand it and I was very disappointed I thought he seemed like a bright guy but it's only the chain rule it's not that hard to understand and he said oh no no I understood that I just don't understand why you don't give the gradient to a sensal a sensible function Optimizer which took us quite a few years to think about um and it kept on like that with a he had very good his raw intuitions about things were always very good what do you think had enabled those uh those intuitions for for Ilia I don't know I think he always thought for himself he was always interested in AI from a young age um he's obviously good at math so but it's very hard to know and what was that collaboration between between the two of you like what part would you play and what part would Ilia play it was a lot of fun um I remember one occasion when we were trying to do a complicated thing with producing maps of data where I had a kind of mixture model so you could take the same bunch of similarities and make two maps so that in one map Bank could be close to Greed and in another map Bank could be close to River um cuz in one map you can't have it close to both right cuz River and greed along wayon so we'd have a mixture maps and we were doing it in mat lab and this involved a lot of reorganization of the code to do the right Matrix multiplies and only got fed up with that so he came one day and said um I'm going to write a an interface for Matlab so I program in this different language and then I have something that just converts it into Matlab and I said no Ilia um that'll take you a month to do we've got to get on with this project don't get diverted by that and I said it's okay I did it this morning and that's that's quite quite incredible and throughout those those years the biggest shift wasn't necessarily just the the algorithms but but also the the skill how did you sort of view that skill uh over over the years Ilia got that intuition very early so Ilia was always preaching that um you just make it bigger and it'll work better and I always thought that was a bit of a copout do you going to have to have new ideas too it turns out I was basically right new ideas help things like Transformers helped a lot but it was really the scale of the data and the scale of the computation and back then we had no idea computers would get like a billion times faster we thought maybe they' get a 100 times faster we were trying to do things by coming up with clever ideas that would have just solved themselves if we had had bigger scale of the data and computation in about 2011 Ilia and another graduate student called James Martins and had a paper using character level prediction so we took Wikipedia and we tried to predict the next HTML character and that worked remarkably well and we were always amazed at how well it worked and that was using a fancy Optimizer on gpus and we could never quite believe that it understood anything but it looked as though it understood and that just seemed incredible can you take us through how are do models trained to predict the next word and why is it the wrong way of of thinking about them okay I don't actually believe it is the wrong way so in fact I think I made the first neuronet language model that used embeddings and back propagation so it's very simple data just triples and it was turning each symbol into an embedding then having the embeddings interact to predict the embedding of the next symbol and from that predic the next symbol and then it was back propagating through that whole process to learn these triples and I showed it could generalize um about 10 years later Yoshua Benji used a very similar Network and showed it work with real text and about 10 years after that linguist started believing in embeddings it was a slow process the reason I think it's not just predicting the next symbol is if you ask well what does it take to predict the next symbol particularly if you ask me a question and then the first word of the answer is the next symbol um you have to understand the question so I think by predicting the next symbol it's very unlike oldfashioned autocomplete oldfashioned autocomplete you'd store sort of triples of words and then if you sort a pair of words you see how often different words came third and that way you can predict the next symbol and that's what most people think auto complete is like it's no longer at all like that um to predict the next symbol you have to understand what's been said so I think you're forcing it to understand by making it predict the next symbol and I think it's understanding in much the same way we are so a lot of people will tell you these things aren't like us um they're just predicting the next symbol they're not reasoning like us but actually in order to predict the next symbol it's have going to have to do some reasoning and we've seen now that if you make big ones without putting in any special stuff to do reasoning they can already do some reasoning and I think as you make them bigger they're going to be able to do more and more reasoning do you think I'm doing anything else than predicting the next symbol right now I think that's how you're learning I think you're predicting the next video frame um you're predicting the next sound um but I think that's a pretty plausible theory of how the brain's learning what enables these models to learn such a wide variety of of fields what these big language models are doing is they looking for common structure and by finding common structure they can encode things using the common structure and that more efficient so let me give you an example if you ask gp4 why is a compost heap like an atom bomb most people can't answer that most people haven't thought they think atom bombs and compost heeps are very different things but gp4 will tell you well the energy scales are very different and the time scales are very different but the thing that's the same is that when the compost Heep gets hotter it generates heat faster and when the atom bomb produces more NE neutrons it produces more neutrons faster and so it gets the idea of a chain reaction and I believe it's understood they're both forms of chain reaction it's using that understanding to compress all that information into its weights and if it's doing that then it's going to be doing that for hundreds of things where we haven't seen the analogies yet but it has and that's where you get creativity from from seeing these analogies between apparently very different things and so I think gp4 is going to end up when it gets bigger being very creative I think this idea that it's just just regurgitating what it's learned just pasing together text it's learned already that's completely wrong it's going to be even more creative than people I think you'd argue that it won't just repeat the human knowledge we've developed so far but could also progress beyond that I think that's something we haven't quite seen yet we've started seeing some examples of it but to a to a large extent we're sort of still at the current level of of of science what do you think will enable it to go beyond that well we've seen that in more limited context like if you take Alpha go in that famous competition with Leo um there was move 37 where Alpha go made a move that all the experts said must have been a mistake but actually later they realized it was a brilliant move um so that was created within that limited domain um I think we'll see a lot more of that as these things get bigger the difference with alphao as well was that it was using reinforcement learning that that subsequently sort of enabled it to to go beyond the current state so it started with imitation learning watching how humans play the game and then it would through selfplay develop Way Beyond that do you think that's the missing component of the I think that may well be a missing component yes that the the self-play in Alpha in Alpha go and Alpha zero are are a large part of why it could make these creative moves but I don't think it's entirely necessary so there's a little experiment I did a long time ago where you your training in neuronet to recognize handwritten digits I love that example the mest example and you give it training data where half the answers are wrong um and the question is how well will it learn and you make half the answers wrong once and keep them like that so it can't average away the wrongness by just seeing the same example but with the right answer sometimes and the wrong answer sometimes when it sees that example half half of the examples when it sees the example the answer is always wrong and so the training data has 50% error but if you train up back propagation it gets down to 5% error or less other words from badly labeled data it can get much better results it can see that the training data is wrong and that's how smart students can be smarter than their advisor and their advisor tells them all this stuff and for half of what their advisor tells them they think no rubbish and they listen to the other half and then they end up smarter than the advisor so these big neural Nets can actually do they can do much better than their training data and most people don't realize that so how how do you expect this models to add reasoning in into them so I mean one approach is you add sort of the heuristics on on top of them which a lot of the research is doing now where you have sort of Shan of thought you just feedback it's reasoning um in into itself and another way would be in the model itself as you scale scale scale it up what's your intuition around that so my intuition is that as we scale up these models I get better at reasoning and if you ask how people work roughly speaking we have these intuitions and we can do reasoning and we use the reasoning to correct our intuitions of course we use the intuitions during the reasoning to do the reasoning but if the conclusion of the reasoning conflicts with our in itions we realize the intuitions need to be changed that's much like in Alpha go or Alpha zero where you have an evaluation function um that just looks at a board and says how good is that for me but then you do the Monte Cara roll out and now you get a more accurate idea and you can revise your evaluation function so you can train it by getting it to agree with the results of reasoning and I think these large language models have to start doing that they have to start training their Raw intuitions about what should come next by doing reasoning and realizing that's not right and so that way they can get more training data than just mimicking what people did and that's exactly why alphao could do this creative move 37 it had much more training data because it was using reasoning to check out what the right next move should have been and what do you think about multimodality so we spoke about these analogies and often the analogies are Way Beyond what we could see it's discovering analogy that are far beyond humans and at maybe abstraction levels that we'll never be able to to to understand now when we introduce images to that and and video and sound how do you think that will change the models and uh how do you think it will change the analogies that it will be able to make um I think it'll change it a lot I think it'll make it much better at understanding spatial things for example from language alone it's quite hard to understand some spatial things although remarkably gp4 can do that even before it was multimodal um but when you make it multimodal if you have it both doing vision and reaching out and grabbing things it'll understand object much better if it can pick them up and turn them over and so on so although you can learn an awful lot from language it's easier to learn if you multimodal and in fact you then need less language and there's an awful lot of YouTube video for predicting the next frame so or something like that so I think these multimodule models are clearly going to take over um you can get more data that way they need less language so there's really a philosophical point that you could learn a very good model from language alone but it's much easier to learn it from a multimodal system and how do you think it will impact the model's reasoning I think it'll make it much better at reasoning about space for example reasoning about what happens if you pick objects up if you actually try picking objects up you're going to get all sorts of training data that's going to help do you think the human brain evolved to work well with with language or do you think language evolved to work well with the human brain I think the question of whether language evolved to work with the brain or the brain evolved to work with language I think that's a very good question I think both happened I used to think we would do a lot of cognition without needing language at all um now I've changed my mind a bit so let me give you three different views of language um and how it relates to cognition there's the oldfashioned symbolic view which is cognition consists of having strings of symbols in some kind of cleaned up logical language where there's no ambiguity and applying rules of inference and that's what cognition is it's just these symbolic manipulations on things that are like strings of language symbols um so that's one extreme view an opposite extreme view is no no once you get inside the head it's all vectors so symbols come in you convert those symbols into big vectors and all the stuff inside's done with big vectors and then if you want to produce output you produce symbols again so there was a point in machine translation in about 2014 when people were using neural recurrent neural Nets and words will keep coming in and that have a hidden State and they keep accumulating information in this hidden state so when they got to the end of a sentence that have a big hidden Vector that captures the meaning of that sentence that could then be used for producing the sentence in another language that was called a thought vector and that's a sort of second view of language you convert the language into a big Vector that's nothing like language and that's what cognition is all about but then there's a third view which is what I believe now which is that you take these symbols and you convert the symbols into embeddings and you use multiple layers of that so you get these very rich embeddings but the embeddings are still to the symbols in the sense that you've got a big Vector for this symbol and a big Vector for that symbol and these vectors interact to produce the vector for the symbol for the next word and that's what understanding is understanding is knowing how to convert the symbols into these vectors and knowing how the elements of the vector should interact to predict the vector for the next symbol that's what understanding is both in these big language models and in our brains and that's an example which is sort of in between you're staying with the symbols but you're interpreting them as these big vectors and that's where all the work is and all the knowledge is in what vectors you use and how the elements of those vectors interact not in symbolic rules um but it's not saying that you get away from the symbols all together it's saying you turn the symbols into big vectors but you stay with that surface structure of the symbols and that's how these models are working and that's I seem to be a more plausible model of human thought too you were one of the first folks to get idea of using gpus and I know yansen loves you for that uh back in 2009 you mentioned that you told yansen that this could be a quite good idea um for for training training neural Nets take us back to that early intuition of of using gpus for for training neural Nets so actually I think in about 2006 I had a former graduate student called Rick zisy who's a very good computer vision guy and I talked to him and a meeting and he said you know you ought to think about using Graphics processing cards because they're very good at Matrix multiplies and what you're doing is basically all matric multiplies so I thought about that for a bit and then we learned about these Tesla systems that had um four gpus in and initially we just got um gaming gpus and discovered they made things go 30 times faster and then we bought one of these Tesla systems with 4 gpus and we did speech on that and it worked very well then in 2009 I gave a talk at nips and I told a thousand machine learning researches you should all go and buy Nvidia gpus they're the future you need them for doing machine learning and I actually um then sent mail to Nvidia saying I told a thousand machine learning researchers to buy your boards could you give me a free one and they said no actually they didn't say no they just didn't reply um but when I told Jensen this story later on he gave me a free one that's uh that's very very good I I think what's interesting is um as well is sort of how gpus has evolved alongside the the field so where where do you think we we should go go next in in the in the compute so my last couple of years at Google I was thinking about ways of trying to make analog computation so that instead of using like a megawatt we could use like 30 Watts like the brain and we could run these big language models in analog hardware and I never made it work and but I started really appreciating digital computation so if you're going to use that low power analog computation every piece of Hardware is going to be a bit different and the idea is the learning is going to make use of the specific properties of that hardware and that's what happens with people all our brains are different um so we can't then take the weights in your brain and put them in my brain the hardware is different the precise properties of the individual ual neurons are different the learning used to make has learned to make use of all that and so we're mortal in the sense that the weights in my brain are no good for any other brain when I die those weights are useless um we can get information from one to another rather inefficiently by I produce sentences and you figure out how to change your weight so you would have said the same thing that's called distillation but that's a very inefficient way of communicating knowledge and with digital systems they're immortal because once you got some weights you can throw away the computer just store the weights on a tape somewhere and now build another computer put those same weights in and if it's digital it can compute exactly the same thing as the other system did so digital systems can share weights and that's incredibly much more efficient if you've got a whole bunch of digital systems and they each go and do a tiny bit of learning and they start with the same weights they do a tiny bit of learning and then they share their weights again um they all know what all the others learned we can't do that and so they're far superior to us in being able to share knowledge a lot of the ideas that have been deployed in the field are very old school ideas uh it's the ideas that have been around the Neuroscience for forever what do you think is sort of left to to to apply to the systems that we develop so one big thing that we still have to catch up with Neuroscience on is the time scales for changes so in nearly all the neural Nets there's a fast time scale for changing activities so input comes in the activities the embedding vectors all change and then there's a slow time scale which is changing the weights and that's long-term learning and you just have those two time scales in the brain there's many time scales at which weights change so for example if I say an unexpected word like cucumber and now 5 minutes later you put headphones on there's a lot of noise and there's very faint words you'll be much better at recognizing the word cucumber because I said it 5 minutes ago so where is that knowledge in the brain and that knowledge is obviously in temporary changes to synapsis it's not neurons are going cucumber cucumber cucumber you don't have enough neurons for that it's in temporary changes to the weights and you can do a lot of things with temporary weight changes fast what I call fast weights we don't do that in these neural models and the reason we don't do it is because if you have temporary changes to the weights that depend on the input data then you can't process a whole bunch of different cases at the same time at present we take a whole bunch of different strings we stack them stack them together and we process them all in parallel because then we can do Matrix Matrix multiplies which is much more efficient and just that efficiency is stopping us using fast weights but the brain clearly uses fast weights for temporary memory and there's all sorts of things you can do that way that we don't do at present I think that's one of the biggest things we have to learn I was very hopeful that things like graph core um if they went sequential and did just online learning then they could use fast weights um but that hasn't worked out yet I think it'll work out eventually when people are using conductances for weights how has knowing how this models work and knowing how the brain works impacted the way you you think I think there's been one big impact which is at a fairly abstract level which is that for many years people were very scornful about the idea of having a big random neural net and just giving a lot of training data and it would learn to do complicated things if you talk to statisticians or linguists or most people in AI they say that's just a pipe dream there's no way you're going to learn to really complicated things without some kind of innate knowledge without a lot of architectural restrictions it turns out that's completely wrong you can take a big random neural network and you can learn a whole bunch of stuff just from data um so the idea that stochastic gradient descent to adjust the repeatedly adjust the weights using a gradient that will learn things and we'll learn big complicated things that's been validated by these big models and that's a very important thing to know about the brain it doesn't have to have all this innate structure now obviously it's got a lot of innate structure but it certainly doesn't need innate structure for things that are easily learned and so the sort of idea coming from Chomsky that you won't you won't learn anything complicated like language unless it's all kind of wired in already and just matures that idea is now clearly nonsense I'm sure shumsky would appreciate you calling his ideas nonsense well I think actually I think a lot of chs's political ideas are very sensible and I'm was struck by how how come someone with such sensible ideas about the Middle East could be so wrong about Linguistics what do you think would make these models simulate consciousness of of humans more effectively but imagine you had the AI assistant that you've spoken to in your entire life and instead of that being you know like chat today that sort of deletes the memory of the conversation and you start fresh all of the time okay it had self-reflection at some point you you pass away and you tell that to to the assistant do you think I me not me somebody else tells that toist yeah you would it would be difficult for you to tell that to the assistant um do you think that assistant would would feel at that point yes I think they can have feelings too so I think just as we have this inner theater model for perception we have an inthat model for feelings they're things that I can experience but other people can't um I think that model is equally wrong so I think suppose I say I feel like punching Gary on the nose which I often do let's try and Abstract that away from the idea of an inner theater what I'm really saying to you is um if it weren't for the inhibition coming from my frontal loes I would perform an action so when we talk about feelings we really talking about um actions we would perform if it weren't for um con straints and that really that's really what feelings are the actions we would do if it weren't for constraints um so I think you can give the same kind of explanation for feelings and there's no reason why these things can't have feelings in fact in 1973 I saw a robot having an emotion so in Edinburgh they had a robot with two grippers like this that could assemble a toy car if you put the pieces separately on a piece of green felt um but if you put them in a pile his vision wasn't good enough to figure out what was going on so it put his grip whack and it knocked them so they were scattered and then it could put them together if you saw that in a person you say it was crossed with the situation because it didn't understand it so it destroyed it that's profound you uh we spoke previously you described sort of humans and and and and the llms as analogy machines what do you think has been the most powerful analogies that you found throughout your life oh in throughout my life um woo I guess probably an a sort of weak analogy that's influenced me a lot is um the analogy between religious belief and between belief in symbol processing so when I was very young I was confronted I came from an atheist family and went to school and was confronted with religious belief and it just seemed nonsense to me it still seems nonsense to me um and when I saw symbol processing as an explanation how people worked um I thought it was just the same nonsense I don't think it's quite so much nonsense now because I think actually we do do symbol processing it's just we do it by giving these big embedding vectors to the symbols but we are actually symbol processing um but not at all in the way people thought where you match symbols and the only thing is symbol has is it's identical to another symbol or it's not identical that's the only property a symbol has we don't do that at all we use the context to give embedding vectors to symbols and then use the interactions between the components of these embedding vectors to do thinking but there's a very good researcher at Google called Fernando Pereira who said yes we do have symbolic reasoning and the only symbolic we have is natural language natural language is a symbolic language and we reason with it and I believe that now you've done some of the most meaningful uh research in the history of of computer science can you walk us through like how do you select the right problems to to work on well first let me correct you me and my students have done a lot of the most meaningful things and it's mainly been a very good collaboration with students and my ability to select very good students and that came from the fact that were very few people doing neural Nets in the 70s and 80s and 90s and 2000s and so the few people doing your nets got to pick the very best students so that was a piece of luck but my way of selecting problems is basically well you know when scientists talk about how they work they have theories about how they work which probably don't have much to do with the truth but my theory is that I look for something where everybody's agreed about something and it feels wrong just there's a slight intuition there's something wrong about it and then I work on that and see if I can elaborate why it is I think it's wrong and maybe I can make a little demo with a small computer program that shows that it doesn't work the way you might expect so let me take one example um most people think that if you add noise to a neural net is going to work worse um if for example each time you put a training example through you make half of the neurons be silent it'll work worse actually we know it'll generalize better if you do that and you can demonstrate that um in a simple example that's what's nice about computer simulation you can show you know this idea you had that adding noise is going to make it worse and sort of dropping out half the neurons will make it work worse which you will in the short term but if you train it with like that in the end it'll work better you can demonstrate that with a small computer program and then you can think hard about why that is and how it stops big elaborate co- adaptations um but that I think that that's my method of working find something that sounds suspicious and work on it and see if you can give a simple demonstration of why it's wrong what sounds suspicious to you now well that we don't use fast weight sounds suspicious that we only have these two time scales that's just wrong that's not at all like the brain um and in the long run I think we're going to have to have many more time scans so that's an example there and if you had if you had your group of of students today and they came to you and they said so the Hamming question that we talked about previously you know what's the most important problem in in in your field what would you suggest that they take on and work on on next we spoke about reasoning time scales what would be sort of the highest priority Problem that that you'd give them for me right now it's the same question I've had for the last like 30 years or so which is does the brain do back propagation I believe the brain is getting gradients if you don't get gradients your learning is just much worse than if you do get gradients but how is the brain getting gradients and is it somehow implementing some approximate version of back propagation or is it some completely different technique that's a big open question and if I kept on doing research that's what I would be doing research on and when you look back at at your career now you've been right about so many things but what were you wrong about that you wish you sort of spent less time pursuing a certain direction okay those are two separate questions one is what were you wrong about and two do you wish you'd less spent less time on it I think I was wrong about Boltz machines and I'm glad I spent a long time on it there are much more beautiful theory of how you get gradients than back propagation back propagation is just ordinary and sensible and it's just a chain rule B machines is clever and it's a very interesting way to get gradients and I would love for that to be how the brain works but I think it isn't did you spend much time imagining what would happen post the systems developing as as well did you have an idea that okay if we could make these systems work really well we could you know democratize education we could make knowledge way more accessible um we could solve some tough problems in in in medicine or was it more to you about understanding the Brin yes I I sort of feel scientists ought to be doing things that are going to help Society but actually that's not how you do your best research you do your best research when it's driven by curiosity you just have to understand something um much more recently I've realized these things could do a lot of harm as well as a lot of good and I've become much more concerned about the effects they're going to have on society but that's not what was motivating me I just wanted to understand how on Earth can the brain learn to do things that's what I want to know and I sort of failed as a side effect of that failure we got some nice engineering but yeah it was a good good good failure for the world if you take the lens of the things that could go really right what what do you think are the most promising applications I think Health Care is clearly uh a big one um with Health Care there's almost no end to how much Health Care Society can absorb if you take someone old they could use five doctors fulltime um so when AI gets better than people at doing things um you'd like it to get better in areas where you could do with a lot more of that stuff and we could do with a lot more doctors if everybody had three doctors of their own that would be great and we're going to get to that point um so that's one reason why Healthcare is good there's also just a new engineering developing new materials for example for better solar panels or for superc conductivity or for just understanding how the Body Works um there's going to be huge impacts there those are all going to be be good things what I worry about is Bad actors using them for bad things we've facilitated people like Putin or Z or Trump using AI for Killer Robots or for manipulating public opinion or for Mass surveillance and those are all very worrying things are you ever concerned that slowing down the field could also slow down the positives oh absolutely and I think there's not much chance that the field will slow down partly because it's International and if one country slows down the other countries aren't going to slow down so there's a race clearly between China and the US and neither is going to slow down so yeah I don't I mean there was this partition saying we should slow down for six months I didn't sign it just because I thought it was never going to happen I maybe should have signed it because even though it was never going to happen it made a political point it's often good to ask for things you know you can't get just to make a point um but I didn't think we're going to slow down and how do you think that it will impact the AI research process uh having uh this assistance so I think it'll make it a lot more efficient a research will get a lot more efficient when you've got these assistants that help you program um but also help you think through things and probably help you a lot with equations too have you reflected much on the process of selecting Talent has that been mostly intuitive to you like when Ilia shows up at the door you feel this is smart guy let's work together so for selecting Talent um sometimes you just know so after talking to Ilia for not very long he seemed very smart and then talking him a bit more he clearly was very smart and had very good intuitions as well as being good at math so that was a no-brainer there's another case where I was at a NPS conference um we had a poster and I someone came up and he started asking questions about the poster and every question he asked was a sort of deep insight into what we'd done wrong um and after 5 minutes I offered him a postto position that guy was David McKai who was just brilliant and it's very sad he died but he was it was very obvious you'd want him um other times it's not so obvious and one thing I did learn was that people are different there's not just one type of good student um so there's some students who aren't that creative but are technically extremely strong and will make anything work there's other students who aren't technically strong but are very creative of course you want the ones who are both but you don't always get that but I think actually in the lab you need a variety of different kinds of graduate student but I still go with my gut intuition that sometimes you talk to somebody and they're just very very they just get it and those are the ones you want what do you think is the reason for some folks having better intuition do they just have better training data than than others or how can you develop your intuition I think it's partly they don't stand for nonsense so here's a way to get bad intuitions believe everything you're told that's fatal you have to be able to I think here's what some people do they have a whole framework for understanding reality and when someone tells them something they try and sort of figure out how that fits into their framework and if it doesn't they just reject it and that's a very good strategy um people who try and incorporate whatever they're told end up with a framework that's sort of very fuzzy and sort of can believe everything and that's useless so I think actually having a strong view of the world and trying to manipulate incoming facts to fit in with your view obviously it can lead you into deep religious belief and fatal flaws and so on like my belief in boltzman machines um but I think that's the way to go if you got good intuitions you can trust you should trust them if you got bad intuitions it doesn't matter what you do so you might as well trust them a very very good very good point when when you look at the the types of research that's that's that's being done today do you think we're putting all of our eggs in one basket and we should diversify our ideas a bit more in in the field or do you think this is the most promising Direction so let's go all in on it I think having big models and training them on multimodal data even if it's only to predict the next word is such a promising approach that we should go pretty much all in on it obviously there's lots and lots of people doing it now and there's lots of people doing apparently crazy things and that's good um but I think it's fine for like most of the people to be following this path because it's working very well do you think that the learning algorithms matter that much or is it just a skill are there basically millions of ways that we could we could get to human level in in intelligence or are there sort of a select few that we need to discover yes so this issue of whether particular learning algorithms are very important or whether there's a great variety of learning algorithms that'll do the job I don't know the answer it seems to me though that back propagation there's a sense in which it's the correct thing to do getting the gradient so that you change a parameter to make it work better that seems like the right thing to do and it's been amazingly successful there may well be other learning algorithms that are alternative ways of getting that same gradient or that are getting the gradient to something else and that also work um I think that's all open and a very interesting issue now about whether there's other things you can try and maximize that will give you good systems and maybe the brain's doing that because it's easier but backprop is in a sense the right thing to do and we know that doing it works really well and one last question when when you look back at your sort of Decades of research what are you what are you most proud of is it the students is it the research what what makes you most proud of when you look back at at your life's work the learning algorithm for boltimore machines so the learning algorithm for Boltz machines is beautifully elegant it's maybe hopeless in practice um but it's the thing I enjoyed most developing that with Terry and it's what I'm proudest of um even if it's [Music] wrong what questions do you spend most of your time thinking about now is it the um what what should I watch on Netflix
