The Great AI Debate - NIPS2017 - Yann LeCun

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

r/singularity

^{For mobile and non-RES users} ^| ^{More info} ^| ^{-1 to Remove} ^| ^{Ignore Sub}

👍︎︎ 1 👤︎︎ u/ClickableLinkBot 📅︎︎ May 05 2018 🗫︎ replies

Captions

welcome everyone so good evening and welcome to the first nips debate tonight we are which promises to be a classic in every sense of the word the proposition for this debate meaning the statement that's going to be debated is the following interpretability is necessary in machine learning so let's meet our debaters in the first corner on the affirmative side of the debate measuring in at a height of 5 foot 7 and 200 pounds rich corona rich is a senior researcher at Microsoft Research his current research focuses on learning for medical decision making transparent modeling deep learning and computational ecology his partner in this debate of the decade weighing in with over 11,000 citations give it up for Petrus Samar he's a distinguished engineer at Microsoft Research as well as deputy managing director at Microsoft Research and founder of the computer-human interaction Learning Group Hughes cars current research is focused on making machine learning widely accessible for replicating tasks easily done by humans and in this second corner on the negative side of the debate you've heard him talk already today the king of calibration fearless scaler of soft max temperatures Killian Weinberger his research focuses on learning under resource constraints metric learning machine learning web search ranking computer vision and of course deep learning and finally you may never have heard of this person but weighing in with an H index of 97 is Yann Lagoon he is director of AI research at Facebook and silver professor of data science computer science neural science and electrical engineering at New York University his current interests include AI machine learning computer perception mobile robotics and computational neuroscience the format of the debate has been pre-approved by all the participants the debater debate will start with five-minute introductory statements from all teams then each team will take turns asking a member of the other team a question the total length of time for each question is a maximum of one minute and the answer is have takes a maximum of three minutes so we'll keep questions short and to the point and finally we'll end with a question from the moderators just to remind everyone the proposition for the debate is that interpretability is necessary for machine learning and for the proposition we'll have rich Caruana and Patrice Simard against the proposition will have Killian Weinberger and young McCune and we'll start with RISC heroin let the fun begin okay I want to start off by saying that despite what you may have heard I actually like deep learning and I agree there are many settings where a deep model trained on enough data that exhibits enough accuracy on test data is all you really need to go and you don't need to be able to see what's inside that black box however there are other settings where that's a very risky thing to do even if you've got what looks like superhuman accuracy on the test data and I'm just going to give you one example of that now some of you have probably heard of this example already so I hope I don't bore too many of you this is going to be this pneumonia risk prediction problem so let me summarize something that we've learned with real medical data bad news all of you have pneumonia but good news we've trained a very accurate neural net to distinguish between those of you who are high-risk and need to be hospitalized and those of you who are low risk and be treated as outpatients so now unfortunately we think this neural net has learned some bad things and the reason why we think that is that a rule-based system that was trained on the same data learn some unusual things one of them being that a history of asthma seems to lower your probability of death from pneumonia so asthma is like good for you if you've got pneumonia so now doctors insurance but that's not the case that asthma is actually bad for you if you've got pneumonia as I think many of you expect but it's a real pattern in the data and the reason why it's a real pattern in the data is that asthmatics pay a lot of attention to hell they're breathing and they're already plugged into healthcare and what happens is the asthmatics tend to notice the symptoms of pneumonia earlier they call their doctor earlier they get an appointment earlier and they get treatment earlier so so getting rapid treatment is actually a really good thing if you've got an infection like pneumonia so it turns out it is true that the asthmatics in the data set actually do have less chance of dying because they get to good care faster unfortunately if we use the model to help decide who should be admitted to the hospital or who should get an early appointment at the doctor's office as opposed to you know come in next Thursday or who should receive different kinds of treatment we might end up hurting Mattox because we might not give them the sort of rapid aggressive treatment that actually explains why they've heard low risk in the data set in the first place and that could hurt them okay so what's going on here the problem ultimately I guess is that we want to intervene in people's healthcare and we want to do that without a causal model right so we're gonna use good old standard correlation machine learning supervised learning we're gonna learn something about patients risk and then we're going to you know try to use that risk as a way of making decisions about patient care and as you all know so we're not gonna do the proper counterfactual thing but that's a really risky thing to do but the truth is that's what we're doing a lot in machine learning right where we're often training our model and then using the predictions from that model to make some decision in the real world and very rarely do we actually go through and collect exactly the right data to be able to properly assess whether the model will predict the right thing and in fact in this case for the asthmatics it's not even ethical to collect the right data we're not allowed to send half of the asthmatics home to see if not giving them high quality care you know is good or bad for them just a trained a more accurate model so we have to somehow live with the data we've got so what's the fundamental problem here well we could change the learning problem you might think there are other solutions we could predict the care the doctors give it turns out none of those actually work although I don't have time to talk about it surprisingly the fundamental problem here is missing value variables so asthma is serving as a proxy for something you'd really like to know which is time to care time to treatment so at least for asthmatics what's happening is they're getting to care faster and that's a really good thing for pneumonia patients so the asthma is a nice proxy variable for time to care unfortunately we can't measure time to care like we don't know when you came down with pneumonia we don't know how rapidly the symptoms evolved in you as a patient how quickly you noticed it so even in principle we can't really measure this time to care variable so we're actually never going to have it like like you can't just say oh go go but go back get that time to care variable and then this problem will go away we assume by the way if we could get it it would go away that is if we knew the time to care of all the patients then for all the patients who say came in within two days of onset of symptoms then presumably it would turn out that the asthmatics really were higher risk you know compared to other patients who also came in in two days but didn't have asthma okay so now you might think well can't we fix this asthma problem some other way now that you know about it yes there are lots of tricks for fixing this problem the important thing is though you have to know about it we can't find it in the neural net the only reason why we know about this is because somebody turned to rule based system that we could understand and we read the rule which was that asthma lowers risk it turns out we now have this class of very interprete Balma dolls which lets us see what's being learned in from this data set it turns out that having a history of heart disease and a history of chest pain is also good for you and it's good for you for exactly the same reasons because if you've had a heart attack before and you're now feeling something funny in your breathing you don't even waste an hour before you're in an ambulance or you get to the ER okay so these sorts of problems are ubiquitous they're happening in all the datasets that we look at these sort of proxy of facts they're all over the place we've actually seen several dozen problems like this in this data set that I don't have time to tell you about and we now think that you know you just won't know about these problems you wouldn't have anticipated them in advance if you can't sort of open up your model and see what's happening so so we think interpretation is very very important for these models or else the model is just gonna learn risky things so thanks [Applause] alright interpretability interpretability so it's not that interpretability is completely useless but it's not nearly as useful as you think first of all the vast majority of decisions made by automatic systems this is so many of them nobody actually wants to look at it you know there are classifiers used absolutely every time you go to a website every time you go to Facebook to Google you do a search there's a model it's tailored for you and it makes you know decisions about what to show you all that stuff and there's billions of those things going on every day and there's just not enough brain cells on a planet to even think about interpretability of those things there's just not enough time that's the reason why we want automated decision systems because we don't have enough brains to make those decisions will be economically unfeasible so the vast majority of decisions don't need interpretability or if we had it it would be essentially useless nobody want to look at it now there is a small number of domains where interpretability is not only useful but required like legal decisions like the whole point of a legal decision is you write an opinion so that's the whole thing is interpretability basically and I would concede for certain types of medical decision so that's the example that rich was was talking about but let me tell you a story several years ago I collaborated with a bunch of economists and they were interested in building models to predict real estate prices house prices so we started working on this project they had some data set for our houses in the Los Angeles area and oddly enough and we built two models one that was very simple based on essentially linear prediction or some kind of a local linear predictor and that was simple enough that the economies could interpret it and it could write a paper about it in an economics journal and you know look at the coefficients and say well you know this part was actually important to determine price and whatever and draw you know drew conclusions from it and then we built a second model that was this you know big neural net with some sort of latent variable system that you know computed the desirability of an area and you know it was fairly sophisticated it worked really well much better than the first one so The Economist wrote a paper they used the first the first model because I was useful to them even though the prediction was not so great and then they turn around and I said you know this this thing kind of works kind of well maybe we could do a startup and like predict the price of houses and you know now we could sell this to banks like to figure out the probability that someone will default on their on their mortgage and say okay now which one do you want the one the explainable or the one that actually works and of course you know they want the one that actually worked every single time that I've talked to people who wanted to use a machine learning system and they they look at you in the eye and they say you know it's very important that there is an explanation that's produced for every decision that's made by the system it's it's super important we can't actually we're not interested in you producing anything unless it produces an explanation and then you show them two systems one that has an explanation that's kind of simple its generalized linear model that kind of works but not so well and another one that's you know a little more complicated that works better every single time you'll take the second one every single time which means they don't actually care about explain ability is just a way for them to be reassured and most people when you show them the data are more reassured by an hour c-curve than by an assurance that there is an explanation for what bad decisions the system makes now we deal with this every day the FDA it deals with this every day when you know you get a test for let's say Lyme disease you know the test for Lyme disease as an hour seeker or precision recall or whatever it is that they call it in medicine and it's and it's awful is terrible like the error rate is is atrocious we still use it that's the only thing we have so you know until the 1970s we had no idea how aspirin worked people still to aspirin you know now we have better understanding of it but you know there's a lot of things like this that you know we don't quite understand the action mechanism of medicines we still use them so you know explanations are not that useful if you can prove that the system works in the real condition there is no sufficient kind of testing mechanisms that show you that it works in the condition it is supposed to work in so there are other ways to test the validity validity of a system then explanation and very often we are hypnotized by the fact that we want explanation because it reassures us but in fact factually it doesn't mean that the system is better because it's explainable now the last point I want to make is that this whole story that neural nets have black boxes no they're not like you can look at all the variables in them you can do sensitivity analysis what you want is even if you use a gigantic neural net to make a decision about whether you want to give someone a loan or not you know you don't want to produce an explanation of the type oh you know this coefficient is large so blah okay the explanation you want to give people is you know if your income was 75 bucks per month higher then you know you would have passed the test you would have gotten the loan or you know some something of this type or if your credit score was a little higher you would buy that much right now you can do this sensitivity analysis by grade on this engine you'd put space with any differentiable model you have doesn't matter if it's a gigantic neural net or linear system you can do it in all cases so those things are not bad also is right we can completely investigate how they work thanks [Applause] all right so I should also confess that them deep neural net guy an old deep learning deep learning guide but today I'm talking about interpretability I'm gonna defend interpretability so well let's define it because otherwise we don't know what we're talking about and I'm gonna define it through talking about a function and the goal of machine learning is to find a good function among a notion of bad function and let's call the ocean of bad function the hypothesis base and the problem is that in many cases including most of the application of deep learning we feel like the appetizer space is fixed and that's a problem because the hypothesis space can be very very large so let me give you a simple example imagine the task is to to given a scenario right of fiction book and that's a very large space and even if we don't start from a pixel or character we just start from words and we take a room like this and we put on this side positive examples of plot line and books and on this side we put just gibberish and we put a two-year-old or your best learning algorithm and we decide to wait and we know it's not gonna work and we can have we can grow more brain cells and we can add more GPU and more data and still not gonna work and yet probably everyone in this room can do it so why why is it that it's actually not that hard a function to learn and everyone here can can do it and I'm gonna argue is because we grow the hypothesis space gradually so we learn about words we learn about sentence we learn about star winner about metaphor we learn about how to create characters and this gradual growth of the hypothesis space of this I'm gonna call it structural interpretability this disability to break down the problem into smaller problem makes life much easier when comes to now is this new well in some sort it is because if we look back 30 years ago we had the famous failure of the expert system with all these rules and I'm gonna argue that this was because all these rules they define the the hypothesis space and it was flat and because of that we couldn't debug it it was so complex there was an explosion number of rules and there was just no way to to fix it and I'm gonna say that deep learning is a little bit that that that's symptom right now it's it's flat it's unexplicable we just throw a lot of data at it if we have a lot of data that works well but if we don't have a lot of data then it becomes more complex and the thing is that it's not completely flat because we keep cheating so from from year to year at every every nips we add value we add convolution we had some regularizer and we keep we keep basically changing the the hypothesis space and this progress we document them and in nips paper and whoever doesn't care about interpretability should stop coming to nips for the interpretability of the latest change in the hypothesis phase so we could argue that we do care all at least a little bit about interoperability so we can document the evolution of the hypothesis space which basically makes life easier so basically what i'm talking about here is what I call decomposition or structural into probability I think it makes learning a lot easier I think it's it's it's a very good things to do it's also called modularity and I'm sure we all agree that modularity is a good thing and I think if we want to tackle some very complex problem that kind of interpretability is necessary thank you [Applause] hello and so I want to argue that course with young that you know interpretability is important in very very rare cases and these are extremely rare and often appear kind of in as a motivation in some papers on interpretability in the real world very very often you do not need it and one thing you know that you have to remember is you guys are all using things that are really really complicated for example there's not a single person on this planet who understands how an i7 core Intel chip works it has so many transistors that nobody has any idea how it works some you know there's different teams that understand different parts right but nobody understands the whole thing and yet each one in this room is totally comfortable with using it every single day right of course you can argue you know there's level of abstractions so you could say here's an ALU and here's a memory chip etc but you know you're actually not doing this right ultimately you're not really thinking about how you computer chip works you're just using it and it works and that's exactly what will happen with machine learning about machine learning becomes more and more commonplace in those places where it works it just works there's various reasons why people ask for interpretability and one of them is you know understanding data right so if you're a neuroscientist you run a machine learning algorithm at the end you're really not interested in the prediction you're interested in why it makes certain prediction that's a very valid claim but that's really what is that that's a sensitivity analysis why do you really what you want to know is which feature you know had one input or you know at what effect on the on the output so you don't really have to have a very interpretable method right that you know we've been doing doing that for years and people in neuroscience and biology have been doing this for a long time I wouldn't call that interpretability another thing is machine learning debugging like you know people want interpretive machines such that you know something goes wrong you can fix it but that's not really people fix machine learning algorithms is it right you don't really think about oh you know what does every single weight in your neural network do right instead actually you train you know you change the way you're training it etc you collect more data you clean your data and so on another one is accountability and so you know for example an accident happens in a self-driving car who is accountable for it but again the law doesn't really work that way all right so now that you know a similar case actually this immunization immunization goes wrong occasionally but you don't work you know you don't then investigate why exactly went wrong right this is just something that happens statistically very rarely and you have to have some legal framework that incorporates this but finally the most common reason I think what people who want interpretability especially when you you know non-experts it's trust right people are just scared of machines and that reminds me of the 80s and the you know I remember the time and I read this a newspaper this was like when I just learned to read I guess and and for the first time women became pilots this I grew up in Germany so at the time you know only men were pilots and people got off the plane because a woman was a pilot and they were like you know while we're all for emancipation but that's going too far right and that seems crazy now in hindsight let me think about it right you know why do people have a problem with women female you know women being pilots it makes no sense but basically people were just scared because was something they weren't used to right and as ridiculous that as that seems now looking back on it I believe that and you know in 30 years it's gonna be crazy that people are scared of self-driving cars and the reason is because it's going to be much much safer than people driving cars in fact in 30 years people won't be able to believe that humans wants to off-price right which seems ridiculously scary if I still a few minutes I could do a little experiment let's do an experiment I have one minute awesome I want to do an experiment a poll with the audience so please participate so imagine right it's a sad story sorry to tell you this you have a horrible heart disease and you need heart surgery okay you luckily there is a procedure against this and you know can go to the doctor now and their surgery unfortunately it's pretty dangerous so there's a 10% mortality rate alright so just picture this 10% mortality rate so that's a very human surgeon does it better wife or was a machine that just came out by some company its proprietary you have no idea how it does now how it works but has only a 1% mortality rate it's a 99% of the time it works great the difference is when the human makes a mistake where it accidentally cuts the wrong artery and you die afterwards he or she can explain very clearly oh you know I cut the wrong artery right really sorry about that one where is the Machine you have no idea right it's just some programming now I'm asking you to think about your family and your loved ones and I'm asking you now which one of you would pick the human surgeon raise your hand all right I have some few brave ones which one would take the robot surgeon all right thank you point taken so now we're going to have the each team ask a question of the other team and so we're gonna start with the affirmative team gets to ask a question to the negative team once they respond then the negative team will have a chance to to ask a question and so on and so forth let's see I certainly agree that if you've done adequate testing of this robotic surgeon and in the real world it has only a one percent error rate compared to ten percent for you absolutely agree that's that's the the surgeon I would choose the problem is in the pneumonia example the model was actually rewarded with high accuracy on the test set for predicting that asthmatics with heart disease and chest pain were low risk the model really has false high accuracy and the only way you've detected has the wrong high accuracy for our intended use is to deploy it kill people detect that you've killed people and then pull the model back and try to figure out what's wrong and fix it however if you've got something intelligible or interpretive all you have a chance of finding at least some of these problems before you go ahead and deploy the model and kill people I mean I don't know what they did to the surgical model before they deployed it right it's really hard to get a test set that is actually representative of what's happening in the wild this is the fundamental problem and what we've basically seen is that you know you just can't trust performance on a test set in many domains especially when your intended use of the model is something that's sort of causal or counterfactual you didn't evaluate the model that way when you trained it perhaps so now you really have to do something different and we just find that every time we put on this magic glasses where we can see what's inside the model we find an amazing number of things that are wonderful beautiful that do really make the model high accuracy but we also see things that are very surprising very disturbing and that no human would ever let get deployed in until they found a way to fix it and it's just so important to be able to put on the magic glasses in many domains and see these things before you deploy so so I guess the question is how do you counter that I mean in these domains are you really going to just release the system and then see what happens and then after you know it's good or not good you'll decide whether to keep using it so I have extremely bad news for you for you this is exactly how drugs are tested the the way drugs are tested is you actually try it and you know hopefully you don't kill too many people but that's this is a cure works and and you do it in phases where you start with with you know chemical tests and then you talk with animals and then and then it's drugs deep longing is drugs no that's a bad joke this is a cheap shot I'm sorry yeah maybe those like hallucinatory paintings that are generated you know you know you you can add the product progressively on sort of increasingly larger populations and you have you know a/b testing and things like this and that's that's how everything is tested because ultimately regardless of the fact that you expect you know that you have an explanation or not you cannot guarantee that your explanation is correct whereas testing is the only thing that guarantees that something is its practical correct or helps people or helps cure them and at some point you have to test it right so you know there are testing procedures to avoid killing people with drugs sometimes you know tests for drugs are stopped early because they have bad side effects or they actually kill people or what they are not deemed effective sometimes they're deemed so effective that also the tests are are stopped because it becomes an ethical to not give the medicine to the the be the bee population so I mean there are procedures that have been developed over decades for that for this kind of thing and I don't see why would be different for a system that it is for for drugs or for any kind of systems that we test thoroughly you know we we might have proofs of correctness for the the software that we put in in airplanes and you know using formal methods and stuff but ultimately there is years of testing of the airplanes and there is lots of fixes that take place well you know the resistant testing period that's the only way you can guarantee the system works now if your testing procedure doesn't detect bad behavior that means your testing procedure sucks and you have to change it that means the data set is biased that means you know it's not general enough and you kind of have to make it better so if the statement is there are bad ways to test the system yeah of course they are you can always do it wrong just a brief response I guess the issue here is we're not allowed to do the randomized clinical trial that you're thinking of for patients like asthmatics and heart disease patients for which we actually already know the science reasonably well and know that delaying or withholding care would be bad for these patients so I completely agree with you that for let's say future signals future medical tests things which may be no human really fully understands but which perhaps a deep model looking at raw signals actually can understand that then you do have to go through exactly what you're describing which is you you have to you know train the model as best you can evaluate as best you can and then do a sort of careful rolled out testing I have no problem whatsoever with that but there are a huge number of problems for which it is completely unethical to sort of adopt that approach and the bias problems I have these proxy problems they just exist in every medical data set I look at which means we would never be able to even do that first testing and deployment or clinical trial in a lot of these situations of a deep model because the first one we would discover it was killing asthmatics and heart disease patients and all of a sudden they would just never run another trial so it seems to me the the problem you've been describing is not necessarily one of interpretability of a model but one of causal inference in your in your data and I certainly have absolutely nothing against causal inference I think a lot more people should be working on cars on inference and currently are there's a lot of work at face book on this and you know it's it's part of like a perhaps a very core important thing in in AI to be able to establish causal relationships so that's super important but it's explaining the data it's not explaining the dis model is actually explaining the dependencies in your data that that's important I think that level can we can we now have a question from the negative side towards the positive side okay if I make I I want to say something so when we talk about machine learning in many cases we we talk about learning from data and unfortunately it's very often restricted to learning from labels and n examples and in some way it's one of the dumbest way to communicate learning it's it's you need a lot of data and you need a very long enumeration and then then the magic of deep learning will solve the problem if you have infinite amount of data now there are many other ways to communicate features is one of them and and I told you agreed that you can have I actually don't care about what's inside the box so the black box things absolutely I have no problem with this but I love it when a black box takes other black box as input and I can actually train those black box and basically get some guarantees and now the the higher black box is no longer big black box because it has smaller black box that I can explain and I can keep decomposing and that decomposition is is a language is a way I can teach and is far more effective than the one bit that that the label can provide imagine you want to learn algebra and you go to a class that teaches algebra and we found out that your teacher can only give you yes or no answer on question that you decide to ask and is that effective no that's not very effective there are much better ways to communicate knowledge and and they just happen to be interpretable because it's about asking the right question I mean this is what Socrates was doing it was he was asking the right question as a way of explanation so I actually have a question now so any of the arguments you've used during your your first five minutes so since then can any of those be interpreted or construed as to be an argument in favor of say generalize your models or trees versus a neural net if you don't have either a model class that's very accurate and completely interpretable or techniques for taking something that's much more complex so you mean one of those things that I listed is non interpretable and some of them are so you know a complex ensemble of trees is not very interpretable yeah so so you need I think at least one of two things one is either a model that is itself very accurate so that you're happy to use it but is also very interpretive and maybe they don't exist in some domains I don't think such a thing exists yet in pixels or raw speech signals or places where deep neural Nets are king of the hill I don't think we have anything that can touch it that way or you need techniques that can take a blackbox model maybe a deep neural net and then help you try to understand what what's in it you know open up the black box a bit I think you need one of those two things and I'm certainly not in any way suggesting oh we should stop working on deep neural nests or anything like that I mean they're they're great where they work really well they work incredibly well right now but I think looking inside to find these surprising things that I now realize are sort of landmines scattered throughout almost every data set I look at is just very very important because they would surprise you like if if I gave you the magic glasses and you could actually see everything that was going on inside your neural net in some way you'd be very happy with a lot of it because it's it's really doing a lot of clever things you might actually learn some things from it which would of course be very nice but you would find the a surprising number of things that would scare you and once you some of them are so obvious once you see them it's like oh of course I just sort of realized it was going to do that in the first place and you can't just sort of say oh well then you have to jump to causality because even the causality we're not allowed to do the right causal experiments to collect it on the asthmatics and the heart disease patients what you know just not ethical to do this anymore we have to adopt perhaps like Patrice is arguing sort of training procedures that are somewhat different than just a massive amount of data right we don't train doctors by just giving them a massive amount of data they actually learn you know chemistry physics they know what humans are and then you start teaching them medicine and then you give them practice we don't give them like a hundred million patients who lived or died and say oh just figure it out we actually teach them a tremendous amount of other stuff and because of that they have all sorts of context and background information that prevents them from making what are very silly mistakes I mean everyone in the room frowns when you say asthma is good for you if you have pneumonia but the neural net I train didn't frown when I when it learned that asthma was good for you it was happy to use it and was in fact well rewarded on the on the test set for doing it yeah I mean AI doesn't have common sense yet that's it's that's for sure one thing I want to say then what you're saying you know we shouldn't just train and a lot of data but that's exactly what these machine learning algorithms are good at and that's the one thing you know that's actually that works really successfully right I mean if you think about the machine learning landscape right like what spins those you know successful in last couple years what has you know created all these attention that we got is supervised learning right because it works really really well and and what is it it's just give it a ton of training data and it you know it finds basically patterns you know to predict the label and that's really all it does I think the problem is that people think there's more going on than that right so that people kind of you know give that somehow humanize these these classifiers etc and therefore expect there's you know some what they want understand more than there is but it's really it's just patterns in the data right so one question I have for you is what I ger if do think our intrepid Abel right now I know you don't think deep that's alright but what what makes the list boy that's a tough one um I think there are techniques for trying to open up deep nets and we now we're using some models that are pretty interpret alike these fairly complex generalized additive models where we actually train them to mimic the deep net and then we look at what this model learns when it's a fairly good mimic of the deep net to try to understand it once again though this this would not work at all in pixel space or raw speech space or or places like that this would work in you know places where the features have human meaning and that you could understand what what the explanation might mean there are a number of other techniques so there are people in the room who are developing very specific kinds of of tree based rule-based systems that are the inductive biases specifically designed so that actually doctors can understand them and things like that so there are some techniques but but I don't want to make it sound like we're anywhere near as far along the interpret ability front as we are along the supervised learning deep deep learning from an interesting thing though is you know deep learning models are to me they're sort of IDEO savants they're incredibly breathtakingly good at some things and yet they make really simple mistakes right you can take an image of a cat cleverly manipulate just a few pixels it still doesn't even look like the image has changed as far as I'm concerned it still looks like a cat but you can do this in such a way that the model suddenly thinks it's with high confidence a dog or you can take random bit patterns that look like random bit patterns to me edit some of the pixels in such a way that it's not very confident that it's a dog even though it still looks like a random bit pattern to me or there's the example in the talk earlier today of the system not recognizing cows on the beach even though it's very accurate at recognizing cows in general so so these deep models you know they're amazingly good in some ways and yet they're remarkably non-human in some of the errors that they make and this makes me think that at least in certain domains it's important for us to go that extra mile to try to understand what they've learned because they don't have that simple bias that we have that like it's pretty obvious asthma and heart disease just aren't good for you they just don't know that and they'll happily make mistakes along this these fronts and we just see this sort of time and time again and either bias recidivism fairness and transparency kind of data or we see it in in medical data so so it's just sort of ubiquitous that these things are doing something sometimes incredibly smart you can't necessarily trust the accuracy on the first test set to really be an indicator of their performance in the wild and it would be really nice to to just sort of have some attempts at opening them up to understand not even the individual predictions that make but maybe just understand the model at a whole so that you can kind of vet it or trust it so here is perhaps why this argument is wrong there is a very recent paper by our friends at deep mind on you know alphago zero and then more recently they applied a very similar system to chess and the system was essentially able to you know play at a very high level in just a few hours of training of course on a lot of machines in parallel that system has you know plays in ways are very very strange for a human player it doesn't know about the value of particular piece and you know it plays in very unconventional ways that grandmasters find surprising but still get bitten so that system would beat you every single time and it has no idea of the concept that every expert playing chess has and so that kind of is an example and I'm not saying you know because we saw chests with so medicine right but obviously I don't but but what what he suggests is that the kind of explanation for particular types of moves that we would provide for for chess just apply to our minds but don't apply to actually a system that can beat us right so we can have systems that are more accurate make lower error and that testable in testable ways that will systematically beat us and be better than us and they don't have the same kind of you know intermediate concepts that we have explanations for why they work or even the same strategies they still work better well so that works really well when you have infinite amount of data and the chest case is infinite agree if you if you live in a big city you know critic of this but but they are not that many problems that have infinite amount of data I mean the the problems that have a fixed distribution like images or speech or genome or proteins well we keep accumulating every year another million or two million or ten million labeled point just you know actually images were more data and what we can we can deal with we we can basically build an entire image net in a day but so a lot of our companies including yours with the resources we have we can build a new image net every day yeah but this it's essentially an infinite amount of data but but yeah yes are you agree since an infinite amount of data but they are not that many problems yeah I can you can you cite 50 problems that have an infinite amount of data actually yeah just about 50 all right how about the other problems no I mean there is clearly a lot of important problems where you know we don't have nearly as much data it have been huge advocates of actually trying to develop methods that would allow a machine to run with a very limited amount of data by you know learning unsupervised and acquiring some sort of common sense and you know learning representations of the world etc so I you know I don't I don't want to you know make the opposite argument here certainly and there's you know a lot of really really important problems I mean there are situations like like you know driving cars you certainly don't want you know we seem to have the ability to learn to drive a car without actually crashing mostly most of us don't crash too much I mean we is a high probability of crashing in the first few years you're a driver but you mostly don't to drive without without crashing if you use one of the current reinforcement learning systems that I use for you know playing Atari or go or whatever and to learn to drive a car hopefully simulator the system I'll have to you know run off a cliff about fifty thousand times before you figure that it's a bad idea to run off a cliff and then you know when I feel if another fifty thousand times to figure out how to not run off a cliff so that clearly is not as efficient as the kind of learning we we do and we certainly have to find methods for that and perhaps what we've come as a consequence of those kind of methods is you know perhaps some level of background knowledge about the world and common sense that may solve some of the problem set which was was talking about earlier I have a question so imagine that well let me just tell you a story that happens all the time a model is built in in a company in a big company and at some point someone discovered the model doesn't perform up to spec and the spec may be statistical and so we can actually detect it and then then we go talk to the person and we find out that person has been hired by yawn and now we don't have that person and and now now what do we do while we look at the model and we don't understand anything that's going on in it we don't know how the data was collected we don't know how the features were created we don't know we don't know and not actually that's the good case the bad case is that the model stopped working up to spec and but we don't know the spec and we don't discover that the model is working and all the models that depend on it arm down broken and in programming we have standards when you do a check in you check in everything and you know that the model is consistent we don't have yet that kind of discipline for for machine learning we don't we don't have a check-in that checks the data the features and the code that basically works together plus the regularizer and all that stuff and and we don't have a check-in that forces you to give a name to basically what happen all that you have so the question is what is the discipline that we need basically to to control the semantics of machine learning so that we can interchange the teachers and and we can have continuity and maintainability across multiple people when if we don't have at least some explanation of what's going on let me I know what you're describing I totally agree that's a problem but I don't think that's a problem of interpretability that's just of discipline as you said now you know of course you know if you work at a company I used to work at a company and I had to deal with other people's code who had left to Google Google and and I had to go through that spaghetti code and make sense of it and that's awesome Italy had nothing to do with you know I was not machine learning or anything right it was just basically that people didn't have these these standards I guess at that time was you know that you were forced to comment your code etc so that you do that for machine learning algorithms sure you know I think that makes a lot of sense that company should enforce this but it doesn't have the machine algorithm doesn't have to be interpretive all right as long as you describe exactly what model it is you trained and what the data was you trained it on and what the hyper parameters were I mean we faced it all the time with our papers right we release our code and you know some people are better than others but we try to make our results reproducible and I think you know it works if you are just conscious of the fact that you know you have to be careful and make it easy for them and include all the details but then it's totally doable yeah I think I mean this sticking aspera had to do outside of the original topic but I get confronted very often with people who come to you know the idea of using machine learning with the the state of mind of sort of classical computer science you know so Computer Sciences is kind of slightly obsessive-compulsive aspect where yeah you know you write a program and every single detail has to be exactly correct or your program will just not function and and you know if it does function you can prove it functions correctly you can have you know convergence bounds and things like that and that's the kind of you know exact obsessive-compulsive side of computer science the kind of discrete math side of side of things but then all of what we do with machine learning is solving unsolvable problems approximately and so we're never going to get you know I'm not a percent correct and now the approach is more akin to what people do in medicine or certain areas of engineering where you know engineers invented things like ROC curves and an ROC curve is a way to determine like how well your system will work you know what the trade-off you you're going to choose between detecting things you know detecting things are not of the type that you want to detect or missing things that are of the type you want to detect and that that's that's one our C curve is it was not invented by machine learning people it was invented by radar operators essentially so so we can I have to be more flexible with our criteria and and we have to define some new methodologies as you are referring to is certainly for assessing or or ensuring the reliability or the system will deploy okay we'd like to thank the debaters and finish with the question say something case can I ask one question it's very quick okay imagine you have two learning methods both of them have a 5% error rate on test data and one of them is sort of interpretive all and a human experts a doctor has has looked inside and sort of vetted it and then the other one is a black box you know perhaps a deep neural net that we don't understand yet which one would you prefer I mean would you prefer the interpretable model or would you just trust that 5% is 5% is 5% and you don't really care for the coin pick a model so you can adjust say early I asked which methods are interpretable and you set decision trees there's a huge gap but what did you say no no no I mean most no no there's a form of generalized additive model that's reasonably accurate somewhere was a decision tree you know no wait but go ahead I'm saying of course if you can have it all sure right i but but you cannot and i think the gap right now is you know still pretty large between you know people join these much much simpler models right that they believe are interpretive all you know man ultimately the point that yang made and that you know i also tried to make was you know if there's a gap between a very accurate model it's not interpretable and one that's a lot worse but isn't reputable everybody takes the the better one assuming the trustees being equal sure right now assuming they trust the test data of course i I completely agree but while there might be a gap now on many problems there are many problems for which there isn't again right now and with further I mean you seem to agree that if you could have interpretability and accuracy at the same time of course so so that's sort of suggest well it's important you should work on it it's a big if yeah maybe it is a big if but but I think for many problems you know it's probably achievable or at least a way of opening up the black box models and understanding all right so to conclude the debate we'd like to ask each of you to describe what you think the best argument is for the other side yes we'll start with rich it's a great question by the way so there's no doubt that if you can get significantly higher accuracy on test data that you trust that doesn't have the kind of problems that we're seeing in test data where you're actually rewarded for doing the wrong things there's no doubt whatsoever that if you can get higher accuracy I suggesting would always go for the black box model over the interpretative model I guess if I was a scientist who's more interested in you know trying to advance my field of endeavor and understanding then perhaps I would under but would prefer a more interpretable model for other reasons you know you know to make advances but in a in a purely predictive situation where you've got a model that you completely trust is more accurate you'd be just sort of crazy to not use the more accurate model unless there was some huge advantage to having the explanation and ability so let me give it a crack so it's clear and I'm gonna you know can I make an argument that justify some of the stuff I'm working on but but it's still an argument you know from from the other side there are stupid mistakes that the stupid mistake say to describe like the asthma you can detect them as mistakes or not mistakes of the model but probably bias in the data because of common sense because of common knowledge you know some extraneous knowledge that you have about how the word works that you know makes makes you be very suspicious that you know asthma you know makes you more healthy in some way so that kind of props you to investigate and so the argument there is that machines are going to make stupid mistakes until they have the same kind of background knowledge that we have until they get common sense and so that could be an argument for being careful and you know being more thorough in the testing I guess and and you know let's be careful let's have very careful methods for for testing because machines don't have common sense we can't rely on their background knowledge to kind of catch stupid mistakes of this type all right so since I used to be a deep learning guy it's it's pretty well okay I'm still so I remember there was a time where somewhere someone should remind people this is the first guy who actually trained a commercial net on GPUs and now he's so so what I was going to say is that I remember that when we were doing known that a long long time ago and and people some people were skeptical and I remember thinking what's wrong with these guys because it's absolutely it was absolutely obvious to me that even if our model was not the best yet we would keep adding data and we train and it will eventually win and so I didn't understand why people did not embrace neural nets at the time because it seemed like how can you possibly lose you just wait there's a little bit more data you increase the capacity and eventually it works now again that works when you have a lot of data and so if you have a lot of data absolutely deep learning and and if you infer the knowledge from data interpretability is I would probably agree it's overrated so one thing I'd like one mistake that humans make is that humans always think like humans and humans think that machines think like humans so we humanize machines and so when people see in your network or machine learning algorithm we make decisions we kind of have in our mind why it makes decisions the way does so let's say classifies cows and we assume all you know images of cows as cows while we assume obviously it's because because that's a big fat cow in the image right but it may just not be it may just be because of something else and that doesn't occur to us because we wouldn't think that way right but in the data could very well be and this is the example that I guess rich gave early on that every single cow image had green grass in the background and every single car image has a street in the background and all it does it looks at the pixel and you know bottom right corner of that's green and it's a count if that's you know black it's a car and so that's ultimately the machine is not doing anything wrong right that's a clear pattern in the data set and so it does you know ultimately we asked it to make these predictions and it does it right it's just that when we see it we assume that it works like a human so interpretability helps us debug this in a way that we realize you know no no machines can actually think very great or think actually shouldn't say think can operate very very differently from the ways that humans would make these decisions and I think that's that's a strong argument why you need this why you need interpretability ok great thank you all and let's everyone give a round of applause for our wonderful panelists [Applause]

Info

Channel: The Artificial Intelligence Channel

Views: 38,010

Rating: 4.8841462 out of 5

Keywords: singularity, ai, artificial intelligence, deep learning, machine learning, deepmind, robots, robotics, self-driving cars, driverless cars, Yann LeCun, Neural Information Processing Systems, NIPS2017

Id: 93Xv8vJ2acI

Channel Id: undefined

Length: 58min 16sec (3496 seconds)

Published: Wed Jan 31 2018