"Advances in Deep Neural Networks," at ACM Turing 50 Celebration

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

It will be interesting to see how this discussion will be judged by the future, say, 30 years from now. What we thought were the limitations, how we were hoping to tackle them.

👍︎︎ 8 👤︎︎ u/visarga 📅︎︎ Jul 05 2017 🗫︎ replies

Very nice discussion, especially from Jordan. However, it'll most likely not be very welcomed here.

👍︎︎ 4 👤︎︎ u/KozutheGosu 📅︎︎ Jul 05 2017 🗫︎ replies

I thought this had really good discussion about the objectives and limits of deep learning by leaders from various parts of machine learning and AI.

👍︎︎ 2 👤︎︎ u/fiskak 📅︎︎ Jul 05 2017 🗫︎ replies
Captions
so our first panel is advances in deep neural networks newer networks can be trained with relatively modest amounts of information and then applied for large quantities of unstructured data this panel will discuss the current state of neural networks and what changes they may bring as they continue to develop our moderator is well as the true pioneers in the area of artificial intelligence please welcome 2011 cheering lauric do their pearl and his panel [Applause] welcome further the panel on advances ting they're deep neural networks and what they going to do is they start by introducing the members of the panel please come to the podium introduce yourself and your relationship to this technical up in any order hi my name is Bev Bailey I'm a professor at Stanford University director of the Stanford AI lab and I'm currently on chief scientist of AI and machine learning at Google Cloud I think we'll do it from the seats is it mo standing up so because we had mics on my name is Stuart Russell I'm the professor of computer science at UC Berkeley and I work on pretty much all areas of AI except for deep learning hello everyone my name is Ilya sutskever I'm the co-founder and research director of open AI previous lives to work at the brain and I was a PhD student objection Jane and I work on all areas of deep learning and no I don't know areas outside of his life so my name is Raquel urges and I'm a professor at University of Toronto I'm a psycho founder of the vector Institute for AI in Toronto and recently also the head of over dtdv Toronto and I work against in surviving cars machine learning computer vision and using deep learning Michael Jordan University of California Berkeley been doing machine learning for about 30 years and interested in all areas of it including deep learning but not only we're going to give each panel member 7 a half minutes to make a statement but his or her perception of where the field is going and I will start that with my own seven and a half minutes when we finish the seven a half minutes they are going to interact informally among ourselves reacting to each other statements and then we're going to open it for question answers from the audience so let me start my talk by mentioning that like Stuart Russell I too AM toerner to this field and I have of course I was wanted by newer networks in the 1976 paper by rumelhart she got me to think about belief propagation in the Bayesian network but I left this field like early and I spent my time admiring the advances of these fields from the outside from an outside view point and this is what I'm going to present here today and outside the perception of where the field is going but I left this field because partly because I realized that human beings are not not good in handling information human being are very bad in processing statistical information in obtaining statistical intuition and the mode of which human being our working is the causal mode which means people are very good very efficient in drawing causal conclusion from causal inputs not so good in drawing statistical of conclusion from statistical input and what I'm looking if I am looking at the field of neural network I find I am the deep learning and the entire machine learning area I find that it's primarily working in statistical mode namely the program improves its performance in by optimizing parameters based on a stream of observational data and that is something which we learned in our field of causal you learn to avoid and we have to look down upon we learn to avoid not ask some only is already recognized by that only statistician that color correlation is not causation over we have a very good existence proof and it which happens 40,000 years ago when when our forefathers Homo sapiens took over the planet over the Neandertal and their Homo erectus a species of human being by virtue of one computational advantage their head and this is to build a model of one's environment this is also speculation of course but the question is something happened in about 40,000 years ago which anthropo anthropologists and historians find very hard to explain and I think that we computer science can explain and what happened there is it is about 40,000 years ago we find this first figuring out effects which which depict world that could not exist controlling sense a combination of a line head with human body the ability to imagine things that do not exist physically that can only happen if the organism we are talking about has a representation of the environment and is able to manipulate that in representation internally by imagination before it is manipulated in the physical world and that is one explanation of the reason that me that there was an accelerated process over normal evolution snakes and eagles have developed over over million years they develop superb optical system it could not so far buy anything in you can do in the laboratory but that took a long survival of the fittest kind of evolutionary process over millions of years and they end up with a very specific perfect superb system they could not build eyeglasses or telescopes it's something that human being managed to do in only 1,000 years which is from up for us computer scientists latest a basic question what computational facilities we people having a cognitive evolution evolution area that previous species did not have and the hypothesis which I found very team spying ill that they had ability to represent the environment and manipulate the environment and actually developed a market for promises because if you have this ability to predict the future on the basis of manipulating and being I've done of the world then you can trade with promises and you establish large communities on the basis of promises and so on coming now down to machine learning they find that you can do very little with correlations with statistical learning and this is and we have also found the theoretical justification which goes which is organized in the hierarchy on the self level you have statistical reasoning which can tell you only what will be likelihood that something else happens given that you saw another variable or another proposition what is the likelihood that what can symptom tell you about a disease or what is the result of the yeah so we can tell you about the election and then you have a second level which is totally divorce of the first one which is required a substantial crossover reflect the vertical impediment and this is actually what will happen if I act what if we raise prices what if you make me laugh and so on that is a second level of the hierarchy and a third level of the hierarchy is contraception this is the language used by scientists what if I would do things differently was it the aspirin thank you my headache for the television program that they watch that Kanto function is the top level with scientific so I'm now as we look at the working in the success of machine learning we ask ourselves are they aware of the basic mathematical limitations that were discovered in the causal inference era how we preparing a way to circumvent that theoretical impediments that prevents us from going from one level of the ladder to another length to another level my I will ever start by saying that data science should be a two-body problem it's a it's a relationship between reality and data it's not a single body problem as we can see practice today data in non science data itself is not very science [Applause] so I have made some notes on the way over and over this morning one of the great achievements of computer science in last 50 years we can't think without a piece of metal in our hands so I want to pick up on UTA's reference of David Rommel hard David Rommel Hart was the inventor of the modern bat propagation in deep neural network idea in the early 80s he died young from a brain disease we all miss him he would I think be in this room and probably over this little section right here and I remember I was sitting next to him in the 1980s when he was developing these ideas and he had many minsky and papert's book in his hand and he kept adjusting the program and running over and trying yet another problem out of him in fact when he said yeah I could do that and I can do that and I can do that I kept asking well what can't you do because that's often the most interesting thing and so we had some debate about that and that's good return in my conversation a day first of all I don't think machine learning his best thought of as part of AI I think it's best thought of as part of computer science so all of our infrastructure all of our systems should be intelligent adaptive in learning and I think it's been actually a mistake to couch it too much in terms of ai ai tends to think about how do we make a human how do we do vision speech etc etc but if you think about all of our transportation networks our logistic change our financial system our healthcare at all intelligence in it's all computer science and all it has to use data so that's the kind of thing we should be targeting so you think that way there really can't be kind of one architecture of AI a one architecture of mission at least not yet maybe eventually will come out you can't imagine layers of matrices trained by stochastic gradient at San Adi architecture of all intelligence that's just horrible so some of the debates are well neural nets can't do this or they can't do this if they can't do this we have to cast them aside there must be something else maybe it's the Bayesian networks or maybe it's sore or something like that and the AI editors often then hold this fight between what's the architecture all right I don't think that's right we think it's a toolbox and neural nets will be forever in that toolbox all right but there are many other parts of that toolbox so you have to think about the whole toolbox when you build a system like a self-driving car it's not just a neural net it's all the engineering around it our field is really just emerging by trying to provide all the structure around it and eventually lead towards an architecture a operating system it won't be just a map so let's think a little bit about kind of what you can't do very easily with your own ads you can do pattern recognition you can do it at large scale that was that's kind of how sort of surprising we started getting areas of natural language processing there I think there's been a bit too much hype and in fact I think even David Romo heart would agree there's been way too much hype about that so natural language processing when you start to cross over but from syntax to the real world a semantics is something that's not just interesting but true that's a barrier we're still not very good at going past so I was just in China the other days the other day and there was so much talk about neural nets and deep learning and it's really you know way too much hype in particular people were talking about chat box how good are the chat box now that every teenage kid is using the chat box all right so I looked at a little bit of course I can't beat Chinese but you know it's just ridiculous so these are systems that are kind of like the chief you know keyword based or regular expression based things of the prior past they kind of see what you're saying and they throw something interesting sound a back at you and they keep doing that for a while when they get confused because you use the word like he or she and they have no clue what he or she means back in the past co-reference they'll just say something else interesting sounding and go off and talk about some other celebrities news or something and that works perfectly well for about two weeks and then they get tired of it because the chat might have to learn anything about them the context and a chat gets about camp same thing true about the real world it's just syntax all right so when you try to get past that and get it something true you have to do things like what's the metric it's not just that people listen to it for a long time it amuses people the metric is can we help people to do things you get task completion so if I put a chat bot inside of a program to make me to do travel arrangements for me do I actually get the actual ticket in my hands and there it's very very poor all right and a lot of the problem has to do with natural language naturally which is extremely hard there's coreference there's an afro there's some please that is metaphor and we use it so fluently and it's not just a matter of mimicking natural language so translation has been done pretty well nowadays by kind of neural net type architectures all right but the error rates are still high enough that every sentence which is about 25 words long will have an error in it right and that getting that last part to be good enough that we really you know that strikingly taurine would agree that we've solved the problem we're so we're very very far from that all right we're very very far from that we don't understand the underlying semantics of what the language is referring to we're just mapping strings to strings that can't be the way that intelligence really arises okay that's not what translation is now if you go beyond translation to dialogue all right you're not going to have enough training data for dialogue if I even think of a good travel domain all the things like I said I want to go to Paris but not through Chicago though I know there's always trouble oh I might want to stop over in New York oh no let's not do that let's back up and do something else all the paths that we could go down just in my conversation do that over a billion people we're not getting enough training data for that okay so just plain ol neural net is not going to do that in ten years and narrow domains that are useful and you maybe you can build a company around them yeah all right so that's kind of what's changed now at some of these domains where we're faking intelligence with neural nets all right we're faking it well enough that you can build a company around it so that's interesting but somehow not intellectually intellectually satisfying ok ok so how could you say things that are true all right that's hard that's hard and how do you how do we get the flexibility of natural language so let me invent the word right now I'm going to invent the word grebe ok never been used in English before for house all right I'm going to say one sentence all right the Greve walk from SFO to the Westin in 30 minutes all right sorry computer scientists always know these little factoids that ruin your forget what you just said all right so I gave you one sentence and now you know an enormous amount about greep I can ask you other questions about Greed's and you could answer them and you can have uncertainty about your answers as well all right that's not what neural nets do you don't let require billions of training examples and going again and again the training set so something's missing something is missing okay so we're in here of enormous hype about deep learning and yes they could do break things and yes we can build some companies around and yes it'll change the economy but we're not there yet okay so I would argue that deep learning is deep architectural II there's layers right but somehow it's not deep in terms of abstraction and semantics so semantics as to whether things are true you don't need logic to talk about that you just need a world you need to talk about what's true and abstractions so important the brief example that's kind of a I just gave you a new concept and what we're very good at is looking at something and abstracting and that's what a lot of computer science is about and neural nets don't do that very well so their layers and somehow in some implicit sense as you get near the end of all those layers it becomes a little more abstract but that's way too implicit for a science of computing so I finished with five seconds left that's pretty good huh thank you to the organizers for this pan I didn't really don't get this exciting since the start so I have to say Mike I agree with everything you said I grew up reading your papers as a as a student of machine learning there's one thing I do disagree is that I do think that machine learning is part of AI and AI is a discipline for understanding intelligence and making telogen machines just like physics has a whole set of tools from calculus to statistical statistical mechanics the set of tools were machine learning is the methodology we use for solving AI problems but I agree with everything else so I think you know undoubtedly we're living one of the most excited and using Mike's word hyped era of AI and most mostly because of the recent renaissance and revolution of deep learning and I want to begin my conversation with paraphrasing a quote from turtle and also inspired by another great mentor of my own our colleagues depinna malik saying that it's not the beginning it's not and it's perhaps not the beginning of the end it's probably the end of the beginning and that's what I think where we are in terms of AI as a field especially propelled recently by the advances of declaring and machine learning we're really entering a new phase but there is a long way to go and just a little bit of historical context really thank you Barbara for paying tribute to the founding fathers of AI it's just 20 minutes ago AI is a really young field with a very very audacious goal the audacious go is as big I think as physics which is probably a field to understand the workings of our nature and universe AI is the field to understand the working of intelligence and and machine and develop machine intelligence and I see the past 60 years of AI as what we call AI individual AI was developed with pouring the laboratories it was the the questions were asked our founding fathers started exploring a set of tools a set of problems and and even just trying to understand how to measure AI what are the metrics but after these for 60 years especially in the recent past five to ten years hey I did as a field big reach an inflection point mostly driven by the great advances of machine learning your networks being part of it an important part but not only in your network I mean a lot of you have worked on graphical models kernel methods optimization support vector machines and so on so the combination of advances of computer science especially in the hardware area the combination of algorithms especially the she learning as as the language of AI and then the the emergence of data the Internet and broad of massive amount of data gave us today's AI where we're doing tremendously exciting things with statistical based smart data analytics and I do think this is a milestone of a is not that solving all the problems of AI so we we've entered a new phase what I would call AI in vivo is that AI for the first time companies are founded by the by the advances of AI and using AI to solve a lot of the world's problem especially many of us have heard humanity has entered the fourth Industrial Revolution and here a while I'm on sabbatical at Google Google Cloud I see every single industry being being impacted by the massive amount of data and capability of data analytics I see it in health care in transportation in manufacturing financial services in median entertainment in retail and commerce this is every single every single industry and I think this is a very exciting time of AI and also a is helping advancing science and other disciplines of science be it medicine be it you know material science in neuroscience and very quickly social sciences economics so so so I do think there this is a new phase of AI and in the remaining two minutes I do want to agree with Judaea and Mike that there's a lot of more work to be done in AI this kind of euphoria of AI has taken over and we've solved most of the problem it's not true I work in for the past almost two decades I work in computer vision and machine learning and one of our work is image net a lot of you might have heard of it it contributed to the advance of deep learning and object recognition while we talk about celebrate the success we haven't talked about of the failures of image net and I look at image net and we analyze all the success stories that the leader boards of image that we see a lot of failures it boils down to the lack of ability to abstract and reason so in image net object recognition while we can have photos of dogs and baby faces and beach and food very well because they're data of course cats which we can all see small objects we can now find textural object it boils down to the words that were psyche we have not even solved that problem because there's a lot of questions that are challenges that are remaining that's long tail that's lack of data that's reasoning so and finally I want to touch on that I think the future of pushing AI forward has a lot to do with cognition just to quote another one of my favorite computer scientists my colleague and also my next-door neighbor Terry Winograd in his famous paper Thinking Machines which everybody should read he says that the definition of today's a I is a algorithm that can make a perfect test move while the room is on fire so we can change the word chess to go but there it's the lack of context some contextual reasoning that lack of cognitive awareness of our environment lack of the integrated understanding and learning that is still lacking and I'm 25 seconds over today is telling me that's all I want to say it's the end of the beginning we're entering a new phase of AI thank you so I think Michael and Stacey have said a lot of what I was going to say about D blowing I did want to mention since we are here to celebrate Alan Turing in some sense Soylent Turing in many ways was the father of AI even before Minsky and McCarthy and so on so he was writing about it and talking about it in the late forties and early fifties and in some sense not just AI but the whole of computer science sprang from one one idea that he had that there's this mathematical object which is more powerful than any other mathematical object and algebraic expressions then then derivatives stands and set which is the program and as Barbara pointed out the mathematical community simply did not recognize these things as mathematical objects and to some extent you can't blame them when you look at a COBOL program it's hard to imagine that that is a mathematical object but it really is and that that mathematical object has had a slow-motion explosion that has has ripped apart and reshaped a society our economy it's starting to do that universities are very slow but it's starting to rip apart and reshape our universities as well and an AI is just one facet of the revolution that that has come about because of that invention so Alan Turing predicted that AI would would make progress his timeline was perhaps a little bit off but not that much and the title of this panel on deep learning represents one breakthrough so we might criticize deep learning but I think it does represent a breakthrough of thoughts that happened in our ability to learn complex high dimensional mappings and that in some sense fulfills the early promise of neural networks from the mid to late 80s when you know the AI community was somewhat skeptical but we thought well you know perhaps the neural network guys will get perception right but get the low-level perception the image recognition speech recognition they'll get that right and then we'll be able to interface our symbolic reasoning systems to real-world inputs using neural networks as as the pass-through mechanism and to some extent that that's coming true now and we still we're still trying to understand why it is that deep learning systems are successful and there there are the beginnings of some theoretical progress on understanding why they work but I would say there's also still a huge amount of what people call graduate student descent where you just spend graduate students and graduate students gunners making small changes to the architecture of the training algorithm and the parameters of a knob to try to get better performance and that's a disappointing form of cookery that I hope we'll get through that phase pretty soon so I do worry that there's an overemphasis on you know big data and deep learning of will solve all of our problems and you could look back to the 1960s the early stages of AI where where people showed that you know with for example the general general purpose problem over there at Allen mule and herb Simon developed the GPS they could solve problems you know with solutions of length 5 or length 6 and then they thought okay well that's it right just a matter of scaling up and they neglected this little problem that of exponential computation and I think the deep learning community is neglecting this little problem of exponential data and as Michael pointed out there are lots of problems where the amount of data you would need to train a tabula rasa feed-forward network to solve a particular class of problem would would be you know 10 to the 75 10 to the 5 million you know the amount of data would dwarf the size of the universe and we're never going to get that much data and the there are many things that deep learning lack so so keep learning systems are tunable circuit and circuit lack the expressive power that we have in programming languages and first-order logic the Turing equivalent kind of expressive power they lack the compositional declarative semantics that make database systems and logic programming and and knowledge-based systems useful and they elect the prior knowledge so so when when humans learn for example when humans are examining the output of the Large Hadron Collider which is 50 terabytes per second and discovering the Higgs boson in that data they're not approaching that as tabula rasa machine learning systems the guy who cleans up the floors in in the CERN lab does not understand all the data coming out of the Large Hadron Collider because he doesn't know enough physics if you didn't know any physics you could not possibly understand the data coming out of the Large Hadron Collider and a deep learning system will never discover the Higgs boson from the raw data itself just being said in so I think there's a growing understanding of this I don't think that he's learning people are naive area will contradict me but the I think there's an understanding that we need somehow to combine these elements the declarative semantics prior knowledge the capability for inference with this really remarkable ability to learn these high dimensional mappings from data and I think there are many angles from which one could attack this problem it might be that by taking deep learning architectures and gradually adding more structure and and figuring out how they contain declarative information and combine it that might be one direction there's another field which has been going on a little bit under the radar called probabilistic programming which takes the ideas that Judea had about compositional probabilistic models and combines that where the idea that cheering had about you know Turing machines and lets you write probability models in an extremely expressive formal language and that does have in some forms declarative semantics compositional semantics and does do inference and in fact these two fields are not that different when you run a parable istic program although we typically show the power ballistic programming as you know here here are 10 lines that that do you know tracking of objects in real time in video you know there's a 10 line program but when that 10 line program is running it's generating what we think of as a deep generative Network with tens of millions of variables but also the structure that network as the computation runs and as data arrives the structure that network is is changing you know it's continually adapting to the data exploring different structural hypotheses so in some sense it's like deep learning except much much much more capable and flexible so it may be that coming from that angle and adding in some of the deep learning mapping capabilities we will get some progress there so I think there's a lot to do in AI and and Alan Turing predicted that eventually we would achieve and exceed human level AI and then he said well that's a pretty much the end of the human race you know if we're lucky we'll be able to switch them off but probably not and so I think we have at least half a dozen major breakthroughs to go before we get close to the level III but there are very many very brilliant people working on it and I'm pretty sure that those breakthroughs are going to happen so I'm devoting my life these days to figuring out what to do about that and how to make sure that things go well on the future thank you okay so now this will be able to point for me to go and I want to tell you why I like deep learning by finding the field to be very philosophically satisfying and explain and try to explain give some not very widely known explanation for why it works as well as it does so the thing about deep learning that makes it work well if the hypothesis class so in machine learning when you want to learn something interview need to make some non-obvious prior assumptions about what your function should be the prior assumptions we make with deep learning is that you have a circuit and circuits are a pretty good object to learn if you have a really deep circuit it can do a lot of complicated computation I think it is not a well-known fact that they've modestly sized neural network Tony has two hidden layers can sort n n bit numbers so the hypothesis class is very powerful without requiring a lot of parameters at the same time so the really amazing fact about deep learning which is powering all the advantage that we see is that you can actually find the circuit those circuits automatically from data to creating sense it violates all theory it exceeds gradient descent on these circles exceeded all expectations and it's an empirical fact and it's really amazing there is one other thing about deep learning which interesting where there is an argument by which we could have predicted the deep learning would have succeeded at its core perception and album goes like this if you look at if you just a bit of introspection and you look at our vision and we look at our speech you notice that we can see and we can hear things really fast so you look at something in a fraction of a second you know what that is but I knew runs are so slow the fire is most advanced per second and usually much less well this suggests that the process of vision and Creech and some expert perception if you ask experts experts often know what to do in complicated situations right away this is a massively parallel process that doesn't require many steps so kind of like a deep neural network a very large deep neural network and so well then what because their question becomes can we take the biggest liberal networks which we can can you get the biggest computer we can and train the biggest neural metals if you can and see how I was going to do and turns out that this works well enough well enough to be useful for a lot of tasks the other thing which is interesting about deep learning is that these models are so hard to understand you take I think vision is a good example I can speak for myself when I was thinking what you know what kind of computer program could possibly solve vision it was just an incomprehensible problem a program what kind of code what kind of computer code could possibly do that well if you assume that these perception problems are truly difficult to comprehend and you notice that neural networks that will difficult to comprehend and it's actually pretty good because it means it's like it's like five five is fire it means that you found an incomprehensible solution to an incomprehensible problem and I think we can make good analogy school to evolution by natural selection just like Miss learning we have you have a pretty good understanding the process of learning we can make certain statements about what the result will be as we can with the natural selection biological evolution we can predict you'll have certain organisms they'll be producing the high probability they'll have you know certain statements can be made even though it's very hard to understand how biological orbits works I think this is similar a similar thing getting here and then there is a last thing which I want to highlight which is really important that compute compute has being the oxygen of deep learning and if you think about it is next perfect sense let's let's let's go back to the early nineties when people excited about neural nets and they say yeah we discovered back propagation we're going to solve everything well but now your computer is so small you can train on your like really 14 years it's not going to do much no matter how good your machine learning algorithm is no matter how smart the people who set it up it's just not going to solve the problem success is not possible and now when computer's operating fast now we have a GPU you know then you nvidia GPU can do 100 teraflop can you believe that one GPU 100 teraflop that's that's pretty good and now you can train big neural networks and they can do more and that the reason that I feel a lot of excitement about the future is that even though news what is dead our laptops aren't getting much faster neural networks hardware is very much alive and progress there is very rapid and exciting and I panic and I expect that in the next few years you will see some truly unimaginably fast computers which will in turn will allow the unimaginably - I am very exciting progress so let's there's two minutes so I think in short a lot of things are possible the reason neural networks work are a because their circuits and finding the best circles is a pretty good thing to do and furthermore we can look at human beings and look at a test they can solve in a fraction of a second and from that make inferences about the kind of task be given tools to be able to solve compute is really what's driving the whole thing and more compute will result in better and more amazing applications the one thing which I think I agree with some of the things that people raise is that the hypothesis class of neural nets if you think about it is just the robot circuit it's not the final hypothesis class the final hypothesis class these programs if you could find the best program then you're truly done we cannot find this program yet but hopefully we could maybe wrangle these models that will be so that the result so that the things that you produce will be a lot like a computer program one way or another so yeah that's all I have to say thank you very much it's always difficult to go the last since you know my colleagues have already say basically almost everything I wanted to say but I will try my best to keep you entertained before the break so yes one was the survey maybe reiterating - you know we're talking about the breakthrough neural nets and it's actually quite interesting which is you know we are not really talking about a technical breakthrough here right it's not that suddenly we have amazing new algorithms as you know we didn't have before right there is being a few in algorithmic tricks that you know have helped into making this you know making training you know better but there is no fundamental change in the last you know 25 years or more so what's what's why why are we here why are we talking about breakthrough of I guess deep learning or now rebranded as AI tried to the despair of people like to err so you know what came together well this organic trick the avail available availability of a lot of training data that was labeled by like imagenet for example the face very developed and actually assess better hardware right that was actually making possible to actually train this bit models at a large scale and something that we shouldn't forget it actually also you know the hard work of the graduate students right like Alex and alia right that's shown you're really impressive results in computer vision and then the field turn after that right so it's quite quite impressive that you know that they were able to convince the whole community that we should pay attention to this model so so why why are we talking about vectors I think that the real reason is that you know these systems have enabled applications in domains but before you know we don't think about as you know you know places where my children and we make an impact why whether it's health whether it's transportation whether is you know the most boring application you can think of you see my children in AI is deep learning almost everywhere and it's interesting and as Mike was saying you know sometimes you can you know yes you know build a model that can fake things that it's very good at faking and actually you know it's okay if you don't run your pages you know exactly the perfect way but you can get away with you know doing other things this way now I work in so driving cars this is a critical you know safety application right so you can fix things right you actually have to make sure that the system is robust and it's not going to make you know grant decisions and this is actually quite challenging for neural nets for example one of the issues is that they don't really model uncertainty well right so they will tell you you know there is a car there for example with 99 percent probability and they will tell you the same whether they're wrong or not right them most of the time they're right by but when they are wrong actually this is a really real issue for you know in this case for example surviving right so you can do the system that you know has false positives and you want to start you know just break-in and you know in the highway at 120 km/h whatever number of miles that is so that's that's an issue that I think we you know we should think about and this also links to better theoretical understanding of neural nets right so can we have confidence intervals you know the error they're gonna be making and can we go systems that again they are robust based on this I think that another thing that is interesting I haven't web bring about my colleagues is whether now that we see a plethora of applications of deep learning is whether legislation is going to catch up with progress of Technology alright so and you know celebrating cart is one of the examples of you know are we ready to deploy these things and we'll be real ready and what do we have to change in order for that to benefit each one of us I think it's something that you usually in an academic environment we don't think about it but I think it's you know we are the point where we have make so much you know technology has made so much impact in industry that we really need to think about that other things that I think are important and if you were mentioned this is the the fact that you know with neural net so traditional neural and we're forgetting about modeling we're forgetting our prior knowledge we are forgetting about No how can we include the world in a way that actually it's more interpretable and you know again really was to get in a confidence system is everywhere we see and perception is one of the you know canonical examples of this where there is a lot of room to actually incorporate fry your knowledge incorporating you know good modeling at at the same time learn these representations because we don't have good abstractions of pixels for example right so looking at things like graphical models and neural nets and how these things can tie together I think it's a very very interesting direction for the community something I also wanted to bring up is the fact that as there is more and more people actually using live learning we should think about education right we should think about are people aware on the underpinnings of the technology they are using right so one canonical example is firmness right so when people use you know I will mother to the cloud or whatever to you know with their own data to train systems do they understand the systems can actually have biases and as we deploy this in applications that matter for our everyday lives you know are they understanding that face perhaps they're going to make decisions on offer they're gonna you know impact certain sectors of the population perhaps not in the right way so I think that's something that you know as educators we should be really thinking about you know how we make sure that people understand what this models can do cannot do and what potential caveats of the models are and I just wanted to I guess finish with positive thought like I said a lot of negative things I guess which is in a word you know for me the future of deplore and I think you know I want to talk about a little bit on applications and you know certain cars being just one really really exciting area but transportation is the first the first thing you know ii was really good in smaller cities why our cities are growing are getting much bigger and there is you know a lot of issues of cameron with this and i think the machine learning can really help to for us to have a much better life and when we think about this right so self-driving cars for example we can you know I you know can save lives it is no I think it's 1.2 million deaths per year in the roads in the world right is tremendously high number we can reduce congestion pollution you know a study says that it is only one-tenth of the cars that will be needed right but we also need to think about it it's not about building technology we should also think about is the way we use the technology right each one of us is going to just use a self-driving car individually we are not going to help our planet right this is not a way to go we should think of you know public transportation car sharing right sharing and the same and it's not just about transportation right about everything that we have why our planet if she has limited resources and you know we should happen we should think about how you know new technology can help us to you know make the world a place where everybody to go ah I'll start our discussion by trying to summarize what I get from the tunnel so far I'm yet to see that the people who are deeply involved in becoming aware of the limitations of these techniques that they characterize it is it a tool it's a tool it's not AI but another mesh human-level AI it is a tool to be used in the context of other tools but as a protime Theo Titian I would like to ask for a more clear definition of the boundaries if you are if you are aware if you can tell us more about the definition of the boundaries of capabilities of machine learning am I right in assuming that the band's way that I outline here namely going from distribution to a interventional questions going to come to fact is that the boundary that you are worried about Oh is there something more in my simple-minded mind I I've you machine learning is a tool to get reformed samples to distribution and then we still have two more steps to go from distribution joint distribution function life it's a real intelligent two big step one is experimentation and the third one is counterfactual in my life in segmenting the world that way oh this is me it has nothing to do with the thing that you are would this is nice so I think the whole big problem is just decision-making under uncertainty and that the part I need a mapping from A to B is just one little part of that so you know yes you have to worry about counterfactuals and what might be you have to think about rollouts of what's not really working and what might be working there's a whole system you have to build we think oh you have to take about the economic issues so you don't take its kind things I find exciting with all the data around or we can create new markets to link producers and consumers so what I'm kind of involved right now it's music that there's tons of people love to listen to music and tons of you like to make it and know what's making money off of it well there's all this data we can bring them together and let each musician know who likes them and create all these things so that's our huge computation meat data analysis meats prediction makes and then many decisions being made in parallel and then whenever you're making decisions you have to think about the causal side because if you wiggle something and it doesn't give an effect you can't actually do a action you can't have control over the environment so why would ever one separate out the learning part is over here and there's a boundary around that and the decision parts over here and the counterfactuals over here in the world and the Monte Carlo testing out multiple turnings over here it's all part of an integrated systems way of thinking and I also want to just point out one of the fascination to me the challenges of working AI is actually we as a civilization has very little understanding of like what human cognition is and that's actually a really important sister field of AI and because of that we are in a way where were inspiring each other and learning from each other but or both at the very beginning so when you talk about the boundaries we don't even know the clear definition of some of what human cognition is you know human intelligence it's much easier to build boundaries without artifacts like which like deep learning that's an avatar over all the doodlenet that's something that we have created so we can analyze it and come out with the boundaries are we there what maybe one thing remark to make is that the neural nets use back propagation which is the chain rule of calculus to go back and change all the parameters everywhere that's non modular so it's an Indian kind of way of thinking you get an error out here in some part of the system and you percolate everything back and make little changes everywhere that's kind of antithetical to computer science modularity and you know keeping things separate so that's just the fact we have to face is that these systems that give the highest performance are going to make a system it's not very much it or not easy to understand they're not be very easily diagnosed with and that's just that doesn't mean that one's right or one's wrong it's just that we got to face that issue going forward that you know performance and explain ability are bit in trade-offs and also just you know thinking how humans think about the world we don't really separate off one part of from the other it you know things over here can influence several things we're here and we're kind of happy with that and so the keeping things simple in modules is something that we may have to sort of slowly roll you know release as part of the systems that really learn from data and taking account of all the possibilities out there some things to say first is about iPad first is about modularity and second is about what's a cloud to think about the limits and the progress in the field actually like the comment about modularity I think there is a bit of a nuance there that the end systems that you learn usually we tend to not be particularly modular but now we have the ability to complete the great blocks which we can put we can compose together to create whatever you want as we want let me train it and so it's pretty convenient like you can take any date any kind of data format that you want on the input side and any and almost any data formats on the output side and set it all up and the system is going to work and what's interesting is that all these Bitcoin applications they're all given by the same set of algorithmic insane construct by the same set of ideals and what that means is that anytime anyone makes them best there that goes to the level of that you know often can go at the level of the algorithm let me expand the boundary what is possible noticed in one application for the notifications in general and so it is so that some kind of boundaries exist right now they are a little blurry and they're still being pushed and I think I think it's hard to tell how far they'll go just be the current set of ideas excluding any new ones so I think it's instructive to look at alphago so everyone's very impressed by the fact that alphago is now wiping the floor with every single human being on earth and it's often described as a deep learning system but actually it isn't it's a very classical system it would be easily understandable by author Samuel it's essentially an improvement on our fifth annual system from 1957 and it has several components and the deep learn art of it it's just one well I guess it's two of the components but the perhaps the core component still is the fact that alphago knows the rules ago in the sense that for any given position it knows what the legal moves are and it knows what the next state will be when each of those legal moves is executed so that's a the causal theory coming back to you - you get a point the causal theory of the domain and that's written by hand you C++ or something related to these butterflies right it's not learned the learning part of it probably wouldn't work it includes transitive closure because the liveness of a the go piece depends on its connectivity to all other go pieces and so there's a transfer closure thing which is hard to do in a six step circuit so potentially you could learn the rule again by experimentation but in fact they found it much easier to use a nice expressive programming language to write those rules because as we know the rules of go are pretty much translation invariance across the board and they're also time invariance and so there's that and then there's a forward search the the exploration of future possibilities which again is a very classical idea going back at least to Aristotle and maybe earlier and so this I think is emblematic of what we're going to see going forward that the way we build system does not just end and if you literally if you took seriously the idea of end-to-end deep learning of go you would have to learn a policy namely you would take millions and millions of go board you would take the move that was made by some expert program or by a human go Marcus and you would say in a polity that maps from boards to moon but that is not how it works that does not as as far as we know yet work very well people have tried that approach in in backgammon for example it didn't work very well it doesn't work for chess at all and so it's this breakdown of the decision problem into its underlying element the transition model for the domains the ability to look forward in time and the modular knowledge of you know what how to represent the transition model and what it does those elements I think official and if they're not enter and learning related to that and what I was trying to introduce here can also go it take advice from human being you shouldn't have done that would have been better off had you taken this one would it'll be able to understand this kind of advice or perhaps we that's all new architecture to be able to understand such a diet that's my question but it's not an architecture it's a problem to be solved so if you sat down a road alphago as a program is towards talking about you do a lot of Monte Carlo rollouts you build the policy up and you build a system that does that right now there are that's reinforcement learning is the underlying technology that does this and it's a kind of dumb try things out but then kind of smart engineering to make it not intractable there's something called a princess learning and reinforcement it's another part of reinforced learning would you take advice from a human you watch a human doing it and you learn that so it's all a big toolbox it's not that you're building an architecture that can or can't do something it's that you sit down as an engineer and you say I want to make a helicopter flyer I want to do go and we're not building a single device that does all those things that may be 500 years we'll do that but right now what we got to develop is an engineering side of our discipline that thinks about how to build a system it does here's the here's the goals of the system here's my guarantees and here's how I build the system and we're so far from that that you know we have to get real we have to think of systems way of thinking and machine learning to me that's the there's a lot of systems in the audience and I think that your way of thinking together with the mathematics and oisin learning that's the future so you agree with them like what they've been talking about I think that you know we shouldn't you think of you know dibbler analysis and a 12 hour program sanjana learn anything whatsoever right so I mean it has you know very high capacity but still you you need to be able to get to that solution right so for example one can do a demo that works pretty well into how you input images and then you output the steering wheel and that's fine for doing a demo but that's not going to solve them tasks or we have to work at every point in time and when you make a mistake you need to string why you made a mistake and this is why then you need to again interact with modeling but at the same time learn your ninernet components they should be aware of the whole system and that's the interesting difficult part on all this unless you have other comments to make we are going to open the floor for any questions from the audience not no questions Wow education was mentioned what should we be teaching or learning ourselves to make the most significant advancements what are we what is supposed to understand or teach any comments on that okay I have a curve will you teach in the neural network kind of class related I'll just say one one thing in the store I I just finished eating the largest your natural classes then for 717 students we teach a neural network we teach we teach calculus with each chain rule we do Taylor expansion we teach back propagation and then we what is not true about neural network what is what I'm saying not true because teaching is conveying information information is what cannot happen in the world so tell me about neural levels what issue cannot happen in neural network if I had another layer it's not true that so it was one thing about this kind of statement is that this statement so a neural network is a surface of a certain limited side and that question is related to the question of what can a circuit of a certain size not do and those of us who are more on the theory side know that these questions can be extremely difficult lengths and this is precisely why it is so hard to delineate precise boundaries of what can and cannot be done and it's all experimentation going on and often time you think that something can be done and then later someone figures out it can be done yeah so I'm rewriting the AI textbook right now to figure out you know how do we incorporate deep learning and we have a real problem because we have a vision chapter right now and and then in Goodfellow is writing the deep learning chapter and I I suspect division chapter is just going to say see chapter 19 all right and then we have a section on speech recognition is going to say see chapter 19 and so this worries me because then you don't really understand speech if you're opposed to features get lots of speech data and trainer deep learning system and your approach division is get lots of vision data and trainers deep learning systems I think that that's really selling the students short on understanding the speech problem how it connects with language how speech is produced you know the structure of sound and then the same with with vision so I think we're really struggling to figure out how to retain the core content of these areas even though the fact is that right now by far the best speech recognition systems are end to end deep learning with really no internal structure all the stuff that we use to teach about hidden Markov models and you know acoustic models and mixtures of gaussians and all that stuff has gone by the wayside in practice so I hope you have a good chapter writer for vision I think that's exactly what we do in the lab so we talk about what we cannot do in rigid right like the whole 3d vision understanding you know even object there is because of even just an image that a lot of people think with soft vision it's not true at all we all know what are the open problems of vision we don't understand releases of we don't even understand objects themselves the parts the relationships the affordances how can how it can be manipulated the textures the transformation on they're all kind of variances so I think this is really important in the days of deep learning especially in the hype days of deep learning we actually have that discussion and analysis are you volunteering thank you to make one with a brief comment on I just took our opportunity to say something about the curriculum in general not just deep learning but CS curricula I think needs to be revisited to have more statistical and so I like to call it inferential thinking so the counterbalance the computational thinking is inferential thinking you're doing an AV test in industry which people do all the time you're trying to think about what could happen next what what's behind the data that's a form of inferential thinking error bars some confidence intervals are things that could happen but didn't happen you have to protect yourself or and too few of our computer science ever see those concepts so one way of doing what she's talking about is the cloud you do things not just once but you do it in parallel 100 times and you can get an error bar there's a natural match to some of the ways we think in computer science to the needs of inferential thinking and I think our CS curriculum just needs to be really revised from the bottom up to start introducing those concepts you can go through a whole CS curriculum and see a little probability but see really no inference so I think that's really something we have to change is that that's what I teach when I teach things like neural nets I say it's part of a bigger picture of statistical decision making an inference I think it's very important that I'm Internet / - disturbed by what is machine learning and while our neural nets in the context of my channel I'm making sure that everybody has some bigger perspective and it's not they'll sleep learning which is unfortunately what is happening in many many places that yes for both of the rest I want to apologize on my bias selection of the questions which are which are affected by the legibility okay handwriting he is one if I could read how and why should people trust a deeply learned system that makes a decision how can they expect fear how can they well any how can we why should we trust recommendation made by deeply resisted so allows you to bring in another theme that I like to make which is databases so we can't divorce this style which is not anything if your database the database to vote for example about Providence of data and I need to go to track that we're a data point came from before all trust an inference made up by a box that was trained on that data so a medical decision making you have all the time that data which was collected in some era a learning or statistical system was built based on that and then ten years later the inference system is still being used even though the machines may have changed so the data is actually out of date and if we don't build systems that take that kind of Providence into account so that you can say you can trust this inference not just cuz it's a deep learning system or whatever but because it came from this place in this place and that makes your inference relevant but you're not building overall system it's you can actually trust at all you must are pretty you know many fine is pretty but I'm making decisions and even words of explaining why they make those decisions so you know you need to think about we are also stream leaked hard with machines Ryan which are given to the same standard here's a question can you discuss the issue of mental modeling I close with the next if I ask how do you fix a broken phone you can answer it without breaking your phone and anyone wants to comment on that I mentioned that ability to be able to have a mental representation of environment and one of the greatest and breakthrough in the computational capacity of Homo sapiens so yes if you have a causal representation of your environment you should be able to manipulate it in the your head before you do it physically that is the great advantage of having a mental representation and this is one element which I do not find in New Orleans so evidently that is something to work on yeah another one is how I'm [Music] here now what is one question here about the specialization of Noah with what we find today is machine here is the machine learning to the avail consistently find today are specialized do you think that also a general-purpose machine learning AI is really wealthy unless you so I can I can say fewer than that everything is it feasible so special special purpose versus more general purpose it really depends on the problem we're trying to solve many problems they're trying to solve are indeed very specialized and so for those problems we should use the most specialized approach possible but as we set our sights towards more ambitious goals and more ambitious systems that can let's say let's say if you think of let's think of an automated automated mathematician I think the system address you need to be quite a bit more general than a system which does something quite narrow and so what will happen is that it's not quite feasible today but as our computers will inevitably get much faster I think so these kind of systems it will become at least we become possible to design systems which could credibly you could imagine to just give it the right experiences will do what he wants anyone else on the question of general is a vision I think the way to approach it it is two what always play devil's advocate with yourself and say okay I feel something that I think is more general than something else where does it break you know so if you take for example deep mind dqn system which is somewhat general it learns to play a wide range of video games purely from the visual input provided by the game and that's held up as one of the first examples of something that that moves towards general purpose AI on its driving games it's sing song games its mazes and all of it you know if your baby learn to play those games at superhuman level in two hours after birth you'd be pretty terrified but you know when you when you actually say okay well does that technique generalize well the number of time steps that the system is capable of handling in terms of you know the timescales that rewards the timescales of of realizing the benefits of action on it's in a few 10 when you think about okay now translate that to the human physical action and the cells that we operate on which can be or deciding to go to a conference months in advance the conference itself is hundreds of millions of physical actions and you know we make decisions on the scale of billions or even trillions of physical actions it doesn't possibly scale up the approach does not work it fails it breaks and then you have to say okay what could we do to get parts that bottleneck but there's not an infinite number of such bottlenecks and we will pass through them eventually thank you and thank you remember little pub I know and thank you for the audience for being so patient with us and thank for the opportunity to present this idea so let's around impose for the planet you
Info
Channel: Association for Computing Machinery (ACM)
Views: 11,266
Rating: 4.9351354 out of 5
Keywords: Judea Pearl, ACM Turing AWard, Turing Award, Neural Networks, Deep Learning, Machine Learning
Id: mFYM9j8bGtg
Channel Id: undefined
Length: 75min 46sec (4546 seconds)
Published: Fri Jun 30 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.