Ian Goodfellow: Generative Adversarial Networks (GANs) | Lex Fridman Podcast #19

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
the following is a conversation with Ian good fellow he's the author of the popular textbook on deep learning simply titled deep learning he coined the term of generative adversarial networks otherwise known as Ganz and with his 2014 paper is responsible for launching the incredible growth of research and innovation in this subfield of deep learning he got his BS and MS at Stanford his PhD at University of Montreal with yoshua bengio and Erin Kerrville he held several research positions including an open AI Google brain and now at Apple as the director of machine learning this recording happened while Ian was still a Google brain but we don't talk about anything specific to Google or any other organization this conversation is part of the artificial intelligence podcast if you enjoy it subscribe on YouTube iTunes or simply connect with me on Twitter at lex friedman spelled fri d and now here's my conversation with Ian good fellow you open your popular deep learning book with a Russian doll type diagram that shows deep learning is a subset of representation learning which in turn is a subset of machine learning and finally a subset of AI so this kind of implies that there may be limits to deep learning in the context of AI so what do you think is the current limits of deep learning and are those limits something that we can overcome with time yeah I think one of the biggest limitations of deep learning is that right now it requires really a lot of data especially labeled data there's some unsupervised and semi-supervised learning algorithms that can reduce the amount of labeled data you need but they still require a lot of unlabeled data reinforcement learning algorithms they don't need labels but they need really a lot of experiences as human beings we don't learn to play pong by failing at pong two million times so just getting the generalization ability better is one of the most important bottlenecks and the capability of the technology today and then I guess I'd also say deep learning is like a of a bigger system so far nobody is really proposing to have only what you'd call deep learning as the entire ingredient of intelligence you use deep learning as sub modules of other systems like alphago has a deep learning model that estimates the value function most reinforcement learning algorithms have a deep learning module that estimates which action to take next but you might have other components here basically as building a function estimator do you think it's possible you said nobody is kind of in thinking about this so far but do you think neural networks could be made to reason in the way symbolic systems did in the 80s and 90s to do more create more like programs as opposed to functions yeah I think we already see that a little bit I already kind of think of neural nets as a kind of program I think of deep learning as basically learning programs that have more than one step so if you draw a flowchart or or if you draw a tensor flow graph describing your machine learning model I think of the depth of that graph is describing the number of steps that run in sequence and then the width of that graph is the number of steps that run in parallel now it's been long enough that we've had deep learning working that it's a little bit silly to even discuss shallow learning anymore but back when I first got involved in AI when we used machine learning we were usually learning things like support vector machines you could have a lot of input features to the model and you could multiply each feature by a different weight but all those multiplications were done in parallel to each other there wasn't a lot done in series I think what we got with deep learning was really the ability to have steps of a program that run in sequence and I think that we've actually started to see that what's important with deep learning is more the fact that we have a multi-step program rather than the fact that we've learned a representation if you look at things like res nuts for example they take one particular kind of representation and they update it several times back when deep learning first really took off in the academic world in 2006 when Geoff Hinton showed that you could train deep belief networks everybody who was under ested in the idea thought of it as each layer learns a different level of abstraction but the first layer trained on images learn something like edges and the second layer learns corners and eventually you get these kind of grandmother's cell units that recognize specific objects today I think most people think of it more as a computer program where as you add more layers you can do more updates before you output your final number but I don't think anybody believes the layer 150 of the resin it is a grand grandmother cell and you know layer 100 is contours or something like that okay so you think you're not thinking of it as a singular representation that keeps building you think of it as a program sort of almost like a state the representation is a state of understanding and yeah I think of it as a program that makes several updates and arrives it better and better understandings but it's not replacing the representation at each step its refining it and in some sense that's a little bit like reasoning it's not reasoning in the form of deduction but it's reasoning in the form of taking a thought and refining it and refining it carefully until it's good enough to use do you think and I hope you don't mind we'll jump philosophical every once in a while do you think of you know a cognition human cognition or even consciousness as simply a result of this kind of cincuenta sequential representation learning do you think that can emerge cognition yes I think so consciousness it's really hard to even define what we mean by that I guess there's consciousness is often defined as things like having self-awareness and that's relatively easy to turn into something actionable for a computer scientists the reason about people also defined consciousness in terms of having qualitative states of experience like qualia and there's all these philosophical problems like could you imagine jambe who does all the same information processing as a human but doesn't really have the qualitative experiences that we have that sort of thing I have no idea how to formalize or turn it into a scientific question I don't know how you could run in experiment to tell whether a person is a zombie or not and similarly I don't know how you could run an experiment to tell whether an advanced AI system had become conscious in the sense of qualia or not but in the more practical sense like almost like self attention you think consciousness and cognition can in an impressive way emerge from current types of architectures though yes yeah or or if if you think of consciousness in terms of self-awareness and just making plans based on the fact that the agent itself exists in the world reinforcement learning algorithms are already more or less forced to model the agents effect on the environment so that that more limited version of consciousness is already something that we get limited versions of with reinforcement learning algorithms if they're trained well but you say limited so the the big question really is how you jump from limited to human level yeah right and whether it's possible you know the even just building common-sense reasoning seems to be exceptionally difficult so K if we scale things up forget much better on supervised learning if we get better at labeling forget bigger datasets and the more compute do you think we'll start to see really impressive things that go from limited to you know something echoes of human level cognition I think so yeah I'm optimistic about what can happen just with more computation and more data I do think it'll be important to get the right kind of data today most of the machine learning systems we train our mostly trained on one type of data for each model but the human brain we get all of our different senses and we have many different experiences like you know riding a bike driving a car talking to people reading I think when you get that kind of integrated data set working with a machine learning model that can actually close the loop and interact we may find that algorithms not so different from what we have today learn really interesting things when you scale them up a lot and a large amount of multimodal data so multimodal is really interesting but within like you're working adversarial examples so selecting within modal within up one mode of data selecting better at what are the difficult cases from which are most useful to learn from oh yeah like could we could you get a whole lot of mileage out of designing a model that's resistant to adverse fare examples or something like that right yeah question but my thinking on that has evolved a lot over the last few years one nice thing when I first started to really invest in studying adversarial examples I was thinking of it mostly as that versus aryl examples reveal a big problem with machine learning and we would like to close the gap between how machine learning models respond to adversarial examples and how humans respond after studying the problem more I still think that adversarial examples are important I think of them now more of as a security liability then as an issue that necessarily shows there something uniquely wrong with machine learning as opposed to humans also do you see them as a tool to improve the performance of the system not not on the security side but literally just accuracy I do see them as a kind of tool on that side but maybe not quite as much as I used to think we've started to find that there's a trade-off between accuracy on adversarial examples and accuracy on clean examples back in 2014 when I did the first adversary trained classifier that showed resistance to some kinds of adversarial examples it also got better at the clean data on M NIST and that's something we've replicated several times an M NIST that when we train against weak adversarial examples Emnes classifiers get more accurate so far that hasn't really held up on other data sets and hasn't held up when we train against stronger adversaries it seems like when you confront a really strong adversary you tend to have to give something up interesting this is such a compelling idea because it feels it feels like that's how us humans learn yeah the difficult cases we we try to think of what would we screw up and then we make sure we fix that yeah it's also in a lot of branches of engineering you do a worst case analysis and make sure that your system will work in the worst case and then that guarantees that it'll work in all of the messy average cases that happen when you go out into a really randomized world you know with driving with autonomous vehicles there seems to be a desire to just look for think I'd viscerally tried to figure out how to mess up the system and if you can be robust to all those difficult cases then you can it's a hand waving empirical way to show that your system is yeah yes today most adverse early example research isn't really focused on a particular use case but there are a lot of different use cases where you'd like to make sure that the adversary can't interfere with the operation of your system like in finance if you have an algorithm making trades for you people go to a lot of an effort to obfuscate their algorithm that's both to protect their IP because you don't want to research and develop a profitable trading algorithm then have somebody else capture the gains but it's at least partly because you don't want people to make adversarial examples that fool you our algorithm into making bad trades or I guess one area that's been popular in the academic literature is speech recognition if you use speech recognition to hear an audio waveform and then in turn that into a command that a phone executes for you you don't want and a malicious adversary to be able to produce audio that gets interpreted as malicious commands especially if a human in the room doesn't realize that something like that is happening in speech recognition has there been much success in in being able to create adversarial examples that fool the system yeah actually I guess the first work that I'm aware of is a paper called hidden voice commands that came out in 2016 I believe and they were able to show that they could make sounds that are not understandable by a human but are recognized as the target phrase that the attacker wants the phone to recognize it as since then things have gotten a little bit better on the attacker side when worse on the defender side it's become possible to make sounds that sound like normal speech but are actually interpreted as a different sentence than the human here's the level of perceptibility of the adversarial perturbation is still kind of high the when you listen to the recording it sounds like there's some noise in the background just like rustling sounds but those rustling sounds are actually the adversarial perturbation that makes the phone hear a completely different sentence yeah that's so fascinating Peter Norvig mention that you're writing the deep learning chapter for the fourth edition of the artificial intelligence the modern approach book so how do you even begin summarizing the field of deep learning in a chapter well in my case I waited like a year before I actually read anything is it even having written a full length textbook before it's still pretty intimidating to try to start writing just one chapter that covers everything one thing that helped me make that plan was actually the experience of having ridden the full book before and then watching how the field changed after the book came out I realized there's a lot of topics that were maybe extraneous in the first book and just seeing what stood the test of a few years of being published and what seems a little bit less important to have included now helped me pare down the topics I wanted to cover for the book it's also really nice now that the field is kind of stabilized to the point where some core ideas from the 1980s are still used today when I first started studying machine learning almost everything from the 1980s had been rejected and now some of it has come back so that stuff that's really stood the test of time is what I focused on putting into the book there's also I guess two different philosophies about how you might write a book one philosophy is you try to write a reference that covers everything and the other philosophy is you try to provide a high level summary that gives people the language to understand a field and tells them what the most important concepts are the first deep learning book that I wrote with Yahshua and Aaron was somewhere between the the two philosophies that it's trying to be both a reference and an introductory guide writing this chapter for Russell and Norvig book I was able to focus more on just a concise introduction of the key concepts and the language you need to read about them more and a lot of cases actually just wrote paragraphs that said here's a rapidly evolving area that you should pay attention to it's it's pointless to try to tell you what the latest and best version of a you know learn to learn model is right you know I can I can point you to a paper that's recent right now but there isn't a whole lot of a reason to delve into exactly what's going on with the latest learning to learn approach or the latest module produced by learning to learn algorithm you should know that learning to learn is a thing and that it may very well be the source of the latest and greatest convolutional net or recurrent net module that you would want to use in your latest project but there isn't a lot of point in trying to summarize exactly which architecture in which learning approach got to which level of performance so you maybe focus more on the basics of the methodology so from back propagation to feed-forward to recur in your networks convolutional that kind of thing yeah yeah so if I were to ask you I remember I took algorithms and data structures algorithm there of course remember the professor asked what is an algorithm and yelled at everybody in a good way that nobody was answering it correctly everybody knew what the alkyl it was graduate course everybody knew what an algorithm was but they weren't able to answer it well let me ask you in that same spirit what is deep learning I would say deep learning is any kind of machine learning that involves learning parameters of more than one consecutive step so that I mean shallow learning is things where you learn a lot of operations that happen in parallel you might have a system that makes multiple steps like you might have had designed feature extractors but really only one step is learned deep learning is anything where you have multiple operations in sequence and that includes the things that are really popular today like convolutional networks and recurrent networks but it also includes some of the things that have died out like Bolton machines where we weren't using back propagation today I hear a lot of people define deep learning as gradient descent applied to these differentiable functions and I think that's a legitimate usage of the term it's just different from the way that I use the term myself so what's an example of deep learning that is not gradient descent on differentiable functions in your I mean not specifically perhaps but more even looking into the future what's your thought about that space of approaches yeah so I tend to think of machine learning algorithms as decomposed into really three different pieces there's the model which can be something like a neural nut or a Bolton machine or a recurrent model and I basically just described how do you take data and how do you take parameters and you know what function do you use to make a prediction given the data and the parameters another piece of the learning algorithm is the optimization algorithm or not every algorithm can be really described in terms of optimization but what's the algorithm for updating the parameters or updating whatever the state of the network is and then the the last part is the the data set like how do you actually represent the world as it comes into your machine learning system so I think of deep learning as telling us something about what does the model look like and basically to qualify as deep I say that it just has to have multiple layers that can be multiple steps in a feed-forward differentiable computation that can be multiple layers in a graphical model there's a lot of ways that you could satisfy me that something has multiple steps that are each parameterised separately I think of gradient descent as being all about that other piece the how do you actually update the parameters piece so you can imagine having a deep model like a convolutional net and training it with something like evolution or a genetic algorithm and I would say that still qualifies as deep learning and then in terms of models that aren't necessarily differentiable I guess Boltzmann machines are probably the main example of something where you can't really take a derivative and use that for the learning process but you you can still argue that the model has many steps of processing that it applies when you run inference in the model so that's the steps of processing that's key so geoff hinton suggests that we need to throw away back prop back propagation and start all over what do you think about that what could an alternative direction of training nil networks look like I don't know that back propagation is going to go away entirely most of this time when we decide that a machine learning algorithm isn't on the critical path to research for improving AI the algorithm doesn't die it just becomes used for some specialized set of things a lot of algorithms like logistic regression don't seem that exciting to AI researchers who are working on things like speech recognition or autonomous cars today but there's still a lot of use for logistic regression and things like analyzing really noisy data and medicine and finance or making really rapid predictions in really time-limited contexts so I think I think back propagation and gradient descent are around to stay but they may not end up being everything that we need to get to real human level or superhuman AI are you optimistic about us discovering you know back propagation has been around for a few decades so I optimistic bus about us as a community being able to discover something better yeah I am I think I think we likely will find something that works better you could imagine things like having stacks of models where some of the lower level models predict parameters of the higher level models and so at the top level you're not learning in terms of literally calculating gradients but just predicting how different values will perform you can kind of see that already in some areas like Bayesian optimization where you have a Gaussian process that predicts how well different parameter values will perform we already used those kinds of algorithms for things like hyper parameter optimization and in general we know a lot of things other than back prep that work really well for specific problems the main thing we haven't found is a way of taking one of these other non back based algorithms and having it really advanced the state-of-the-art on an AI level problem right but I wouldn't be surprised if eventually we find that some of these algorithms that even the ones that already exists not even necessarily a new one we might find some way of customizing one of these algorithms to do something really interesting at the level of cognition or or the the level of I think one system that we really don't have working quite right yet is like short-term memory we have things like LST M's they're called long short-term memory they still don't do quite what a human does with short-term memory like gradient descent to learn a specific fact has to do multiple steps on that fact like if I I tell you the meeting today is at 3 p.m. I don't need to say over and over again it's at 3 p.m. it's not 3 p.m. it's at 3 p.m. it's a 3 p.m. right for you to do a gradient step on each one you just hear it once and you remember it there's been some work on things like self attention and attention like mechanisms like the neural Turing machine that can write to memory cells and update themselves with facts like that right away but I don't think we've really nailed it yet and that's one area where I'd imagine that new optimization algorithms are different ways of applying existing optimization algorithms could give us a way of just lightning-fast updating the state of a machine learning system to contain a specific fact like that without needing to have it presented over and over and over again so some of the success of symbolic systems in the 80s is they were able to assemble these kinds of facts better but dude there's a lot of expert input required and it's very limited in that sense do you ever look back to that as something that will have to return to eventually sort of dust off the book from the shelf and think about how we build knowledge representation knowledge place well we have to use graph searches searches right and like first-order logic and entailment and things like that a thing yeah exactly in my particular line of work which has mostly been machine learning security and and also generative modeling I haven't usually found myself moving in that direction for generative models I could see a little bit of it could be useful if you had something like a differentiable knowledge base or some other kind of knowledge base where it's possible for some of our fuzzier machine learning algorithms to interact with the knowledge base immanuel Network is kind of like that it's a differentiable knowledge base of sorts yeah but if if we had a really easy way of giving feedback to machine learning models that would clearly helped a lot with with generative models and so you could imagine one way of getting there would be get a lot better at natural language processing but another way of getting there would be take some kind of knowledge base and figure out a way for it to actually interact with a neural network being able to have a chat within y'all network yes so like one thing in generative models we see a lot today is you'll get things like faces that are not symmetrical like like people that have two eyes that are different colors and I mean there are people with eyes that are different colors in real life but not nearly as many of them as you tend to see in the machine learning generated data so if if you had either a knowledge base that could contain the fact people's faces are generally approximately symmetric and eye color is especially likely to be the same on both sides being able to just inject that hint into the machine learning model without it having to discover that itself after studying a lot of data it would be a really useful feature I could see a lot of ways of getting there without bringing back some of the 1980s technology but I also see some ways that you could imagine extending the 1980s technology to play nice with neural nets and have it help get there awesome so you talked about the story of you coming up with idea of Gans at a bar with some friends you were arguing that this you know Gans would work Jenner of adversarial networks and the others didn't think so then he went home at midnight coated up and it worked so if I was a friend of yours at the bar I would also have doubts it's a really nice idea but I'm very skeptical that it would work what was the basis of their skepticism what was the basis of your intuition why he should work I don't want to be someone who goes around promoting alcohol for the science in this case I do actually think that drinking helped a little bit mm-hmm when your inhibitions are lowered you're more willing to try out things that you wouldn't try out otherwise so I I have noticed it in general that I'm less prone to shooting down some of my own ideas when I'm when I have had a little bit to drink I think if I had had that idea at lunch time yeah I probably would have thought it it's hard enough I mean one neural net you can't train a second neuron that in the inner loop of the outer neural net that was basically my friends action was that trying to train two neural nets at the same time would be too hard so it was more about the training process unless so my skepticism would be you know I'm sure you could train it but the thing would converge to would not be able to generate anything reasonable and any kind of reasonable realism yeah so so part of what all of us were thinking about when we had this conversation was deep Bolton machines which a lot of us in the lab including me were a big fan of deep bolts and machines at the time they involved two separate processes running at the same time one of them is called the positive phase where you load data into the model and tell the model to make the data more likely the owners called the negative phase where you draw samples from the model and tell the model to make those samples less likely in a deep Bolton machine it's not trivial to generate a sample you have to actually run an iterative process that gets better and better samples coming closer and closer to the distribution the model represents so during the training process you're always running these two systems at the same time one that's updating the parameters of the model and another one that's trying to generate samples from the model and they worked really well on things like Amnesty a lot of us in the lab including me had tried to get the Boltzmann machines to scale past em inist to things like generating color photos and we just couldn't get the two processes to stay synchronized so when I had the idea for Gans a lot of people thought that the discriminator would have more or less the same problem as the negative phase in the Boltzmann machine that trying to train the discriminator in the inner loop you just couldn't get it to keep up with the generator and the outer loop and that would prevent it from converging to anything useful yeah I share that intuition yeah what turns out to not be the case a lot of the time with machine learning algorithms it's really hard to predict ahead of time how well they'll actually perform you have to just run the experiment and see what happens and I would say I still today don't have like one factor I can put my finger on it say this is why ganz worked for photo generation and deep Boltzmann machines don't there are a lot of theory papers showing that under some theoretical settings the the gun algorithm does actually converge but those settings are restricted enough that they don't necessarily explain the whole picture in terms of all the results that we see in practice so taking a step back can you in the same way as we talked about deep learning can you tell me what generative adversarial networks are yeah so generative adversarial networks are a particular kind of generative model a generative model is a machine learning model that can train on some set of data like so you have a collection of photos of cats and you want to generate more photos of cats or you want to estimate a probability distribution over cats so you can ask how likely it is that some new image is a photo of a cat ganzar one way of doing this some generative models are good at creating new data other generative models are good at estimating that density function and telling you how likely particular pieces of data are to come from the same distribution as a training data gans are more focused on generating samples rather than estimating the density function there are some kinds of games like flow gun that can do both but mostly guns are about generating samples of generating new photos of cats that look realistic and they do that completely from scratch it's analogous to human imagination when again creates a new image of a cat it's using a neural network to produce a cat that has not existed before it isn't doing something like compositing photos together you're not you're not literally taking the eye off of one cat on the ear off of another cat it's it's more of this digestive process where the the neural net trains on a lot of data and comes up with some representation of the probability distribution and generates entirely new cats there are a lot of different ways of building a generative model what's specific against is that we have a two-player game in the game theoretic sense and as the players in this game compete one of them becomes able to generate realistic data the first player is called the generator it produces output data such as just images for example and at the start of the learning process it'll just produce completely random images the other player is called the discriminator the discriminator takes images as input and guesses whether they're real or fake you train it both on real data so photos that come from your training set actual photos of cats and you try to say that those are real you also train it on images that come from the generator network and you train it to say that those are fake as the two players compete in this game the discriminator tries to become better at recognizing where their images are real or fake and the generator becomes better at fooling the discriminator into thinking that its outputs are are real and you can analyze this through the language of game theory and find that there's a Nash equilibrium where the generator has captured the correct probability distribution so in the cat example it makes perfectly realistic cat photos and the discriminator is unable to do better than random guessing because all the all the samples coming from both the data and the generator look equally likely to have come from either source so do you ever do sit back and does it just blow your mind that this thing works so from very so it's able to estimate that density function enough to generate generate realistic images I mean does it yeah do you ever sit back yeah how does this even why this is quite incredible especially where Gant's have gone in terms of realism yeah and and not just to flatter my own work but generative models all of them have this property that if they really did what we asked them to do they would do nothing but memorize the training data right some models that are based on maximizing the likelihood the way that you obtain the maximum likelihood for a specific training set is you assign all of your probability mass to the training examples and nowhere else forgets the game is played using a training set so the way that you become unbeatable in the game is you literally memorize training examples one of my former interns wrote a paper his name is a Vaishnav nagarajan and he showed that it's actually hard for the generator to memorize the training data hard in a statistical learning theory sense that you can actually create reasons for why it would require quite a lot of learning steps and and a lot of observations of of different latent variables before you could memorize the training data that still doesn't really explain why when you produce samples that are new why do you get compelling images rather than you know just garbage that's different from the training set and I don't think we really have a good answer for that especially if you think about how many possible images are out there and how few images the generative model sees during training it seems just unreasonable that generative models create new images as well as they do especially considering that we're basically training them to memorize rather than generalize I think part of the answer is there's a paper called deep image prior where they show that you can take a convolutional net and you don't even need to learn the parameters of it at all you just use the model architecture and it's already useful for things like in painting images I think that shows us that the convolutional network architecture captures something really important about the structure of images and we don't need to actually use learning to capture all the information coming out of the convolutional net that would that would imply that it would be much harder to make generative models in other domains so far we're able to make reasonable speech models and things like that but to be honest we haven't actually explored a whole lot of different data sets all that much we don't for example see a lot of deep learning models of like biology datasets where you have lots of microarrays measuring the amount of different enzymes and things like that so we may find that some of the progress that we've seen for images and speech turns out to really rely heavily on the model architecture and we were able to do what we did for vision by trying to reverse-engineer the human visual system and maybe it'll turn out that we can't just use that same trick for arbitrary kinds of data all right so there's aspects of the human vision system the hardware of it that makes it without learning without cognition just makes it really effective at detecting the patterns we've seen the visual world yeah that's yeah that's really interesting what in a big quick overview in your view in your view what types of Gans are there and what other generative models besides games are there yeah so it's maybe a little bit easier to start with what kinds of generative models are there other than Gans so most generative models are likelihood based where to train them you have a model that tells you how how much probability it assigns to a particular example and you just maximize the probability assigned to all the training examples it turns out that it's hard to design a model that can create really complicated images or really complicated audio waveforms and still have it be possible to estimate the the likelihood function from a computational point of view most interesting models that you would just write down intuitively it turns out that it's almost impossible to calculate the amount of probability they assign to a particular point so there's a few different schools of generative models in the likelyhood family one approach is to very carefully design the model so that it is computationally tractable to measure the density it assigns to a particular point so there are things like auto regressive models like pixel CN n those basically break down the probability distribution into a product over every single feature so for an image you estimate the probability of each pixel given all of the pixels that came before it hmm there's tricks where if you want to measure the density function you can actually calculate the density for all these pixels more or less in parallel generating the image still tends to require you to go one pixel at a time and that can be very slow but there again tricks for doing this in a hierarchical pattern where you can keep the runtime under control or the quality of the images it generates putting runtime aside pretty good they're reasonable yeah the I would say a lot of the best results are from Gans these days but it can be hard to tell how much of that is based on who's studying which type of algorithm if that makes sense the amount of effort invest in it but yeah or like the kind of expertise so a lot of people who've traditionally been excited about graphics or art and things like that have gotten interested in Gans and to some extent it's hard to tell our Gans doing better because they have a lot of graphics and art experts behind them or our Gans doing better because they're more computationally efficient or our Gans doing better because they prioritize the realism of samples over the accuracy of the density function I think I think all of those are potentially valid explanations and it's it's hard to tell so can you give a brief history of Gans from 2014 we paid for 13 yeah so a few highlights in the first paper we just showed that Gans basically work if you look back at the samples we had now they looked terrible on the CFR 10 dataset you can't even recognize objects in them your papers I will use CFR 10 we use em NIST which is little handwritten digits we used the Toronto face database which is small grayscale photos of faces we did have recognizable faces my colleague Bing Xu put together the first again face model for that paper we also had the CFR 10 dataset which is things like very small 32 by 32 pixels of cars and cats and dogs for that we didn't get recognizable objects but all the deep learning people back then we're really used to looking at these failed samples and kind of reading them like tea leaves right and people who are used to reading the tea leaves recognize that our tea leaves at least look different right maybe not necessarily better but there was something unusual about them and that got a lot of us excited one of the next really big steps was lap gown by Emily Denton and seemeth chintala at Facebook AI research where they actually got really good high-resolution photos working with gans for the first time they had a complicated system where they generated the image starting at low res and then scaling up to high res but they were able to get it to work and then in 2015 I believe later that same year palek Radford and sumh intelli and Luke Metz published the DC gain paper which it stands for deep convolutional again it's kind of a non unique name because these days basically all gans and even some before that were deep in convolutional but they just kind of picked a name for a really great recipe where they were able to actually using only one model instead of a multi-step process actually generate realistic images of faces and things like that that was sort of like the beginning of the Cambrian explosion of gans like you know once once you got animals that had a backbone you suddenly got lots of different versions of you know like fish and right they have four-legged animals and things like that so so DC Gann became kind of the backbone for many different models that came out used as a baseline even still yeah yeah and so from there I would say some interesting things we've seen are there's a lot you can say about how just the quality of standard image generation ganz has increased but what's also maybe more interesting on an intellectual level is how the things you can use guns for has also changed one thing is that you can use them to learn classifiers without having to have class labels for every example in your your training set so that's called semi-supervised learning my colleague at open AI Tim Solomon's who's at at brain now wrote a paper called improved techniques for training guns I'm a co-author on this paper but I can't claim any credit for this particular part one thing he showed in the paper is that you can take the gun discriminator and use it as a classifier that actually tells you you know this image is a cat this image is a dog this image is a car this image is a truck and so and not just to say whether the image is real or fake but if it is real to say specifically what kind of object it is and he found that you can train these classifiers with far fewer labeled examples learn traditional classifiers so a few supervised based on also not just your discrimination ability but your ability to classify you're going to do much you're going to convert much faster to being effective at being a discriminator yeah so for example for the emne status set you want to look at an image of a handwritten digit and say whether it's a 0 a 1 or 2 and so on to get down to less than 1% accuracy required around 60,000 examples until maybe about 2014 or so in 2016 with this semi-supervised degan project tim was able to get below 1% error using only a hundred labeled examples so that was about a 600 X decrease in the amount of labels that he needed he's still using more images in that but he doesn't need to have each of them labeled as you know this one's a 1 this one's a 2 this one's a 0 and so on then to be able to for Ganz to be able to generate recognizable objects so object for a particular class you still need labelled data because you need to know what it means to be a particular class cat dog how do you think we can move away from that yeah some researchers at brain Zurich actually just released a really great paper on semi-supervised de Gans whether their goal isn't to classify its to make recognizable objects despite not having a lot of label data they were working off of deep minds big gun project and they showed that they can match the performance of began using only 10% I believe of the of the labels big gun was trained on the image net dataset which is about 1.2 million images and had all of them labelled this latest project from brain Zurich shows that they're able to get away with only having about 10% of the of the images labeled and they do that essentially using a clustering algorithm where the discriminator learns to assign the objects to groups and then this understanding that objects can be grouped into you know similar types helps it to form more realistic ideas of what should be appearing in the image because it knows that every image it creates has to come from one of these archetypal groups rather than just being some arbitrary image if you train again with no class labels you tend to get things that look sort of like grass or water or brick or dirt but but without necessarily a lot going on in them and I think that's partly because if you look at a large image net image the object doesn't necessarily occupy the whole image and so you learn to create realistic sets of pixels but you don't necessarily learn that the object is the star of the show and you want it to be in every image you make yeah you've heard you talk about the the horse the zebra cycle Gann mapping and how it turns out again thought provoking that horses are usually on grass and zebras are usually on drier terrain so when you're doing that kind of generation you're going to end up generating greener horses or whatever so those are connected together it's not just yeah yeah be able to you're not able to segment yeah it's generating the segments away so there are other types of games you come across in your mind that neural networks can play with each other to to to be able to solve problems yeah the the one that I spend most of my time on is insecurity you can model most interactions as a game where there's attackers trying to break your system and you order the defender trying to build a resilient system there's also domain adversarial learning which is an approach to domain adaptation that looks really a lot like Ganz the the author's had the idea before the game paper came out their paper came out a little bit later and you know they they're very nice and sighted again paper but I know that they actually had the idea before I came out domain adaptation is when you want to train a machine learning model in 1:1 setting called a domain and then deploy it in another domain later and he would like it to perform well in the new domain even though the new domain is different from how it was trained so for example you might want to train on a really clean image data set like image net but then deploy on users phones where the user is taking you know pictures in the dark or pictures while moving quickly and just pictures that aren't really centered or composed all that well when you take a normal machine learning model it often degrades really badly when you move to the new domain because it looks so different from what the model was trained on domain adaptation algorithms try to smooth out that gap and the domain adverse oral approach is based on training a feature extractor where the features have the same statistics regardless of which domain you extracted them on so in the domain adversarial game you have one player that's a feature extractor and another player that's a domain recognizer the domain recognizer wants to look at the output of the feature extractor and guess which of the two domains oh the features came from so it's a lot like the real versus fake discriminator and ends and then the feature extractor you can think of as loosely analogous to the generator in games except what's trying to do here is both fool the domain recognizer and two not knowing which domain the data came from and also extract features that are good for classification so at the end of the day you can in in the cases where it works out you can actually get features that work about the same in both domains sometimes this has a drawback where in order to make things work the same in both domains it just gets worse at the first one but there are a lot of cases where it actually works out well on both do you think gas being useful in the context of data augmentation yeah one thing you could hope for with Kenz is you could imagine I've got a limited training set and I'd like to make more training data to train something else like a classifier you could train Magan on the training set and then create more data and then maybe the classifier would perform better on the test set after training on those big ERG and generated data set so that's the simplest version of of something you might hope would work I've never heard of that particular approach working but I think there's some there's some closely related things that that I think could work in the future and some that actually already have worked so if you think a little bit about what we'd be hoping for if we use the gun to make more training data we're hoping that again we'll generalize to new examples better than the classifier would have generalized if it was trained on the same buddy at us and I don't know of any reason to believe that the Gann would generalize better than the classifier would but what we might hope for is that the Gann could generalize differently from a specific classifier so one thing I think is worth trying that I haven't personally tried but someone could try is what have you trained a whole lot of different generative models on the same training set create samples from all of them and then train a classifier on that because each of the generative models might generalize in a slightly different way they might capture many different axes of variation that one individual model wouldn't and then the classifier can capture all of those ideas by training in all of their data so we'd be a little bit like making an ensemble of classifiers and I say oh of gans yeah in a way I think that could generalize better the other thing that gans are really good for is not necessarily generating new data that's exactly like what you already have but by generating new data that has different properties from the data you already had one thing that you can do is you can create differentially private data so suppose that you have something like medical records and you don't want to train a classifier on the medical records and then publish the classifier because someone might be able to reverse-engineer some of the medical records you trained on there's a paper from Casey greens lab that shows how you can train again using differential privacy and then the samples one again still have the same differential privacy guarantees as the parameters that again so you can make fake patient data for other researchers to use and they can do almost anything they want with that data because it doesn't come from real people and the differential privacy mechanism gives you clear guarantees on how much the original people's data has been protected that's really interesting actually I haven't heard you talk about that before in terms of fairness I've seen from triple AI your talk how can an adversarial machine learning help models be more fair with respect to sensitive variables yeah there was a paper from Amos Torquay's lab about how to learn machine learning models that are incapable of using specific variables so to say for example you wanted to make predictions that are not affected by gender it isn't enough to just leave gender out of the input to the model you can often infer gender from a lot of other characteristics like say that you have the person's name but you're not told their gender well right if if their name is Ian they're kind of obviously a man so what you'd like to do is make a machine learning model that can still take in a lot of different attributes and make a really accurate informed prediction but be confident that it isn't reverse engineering gender or another sensitive variable internally you can do that using something very similar to the domain adversarial approach where you have one player that's a feature extractor and another player that's a feature analyzer and you want to make sure that the feature analyzer is not able to guess the value of the sensitive variable that you're trying to keep private right that's yeah I love this approach so we'll yeah with the with the feature you're not able to infer right this sensitive variables yeah brilliant it's quite quite brilliant and simple actually another way I think that Ganz in particular could be used for fairness would be to make something like a cycle again where you can take data from one domain and convert it into another we've seen cycle again turning horses into zebras we've seen other unsupervised gains made by Ming Yue Lu doing things like turning day photos into night photos I think for fairness you could imagine taking records for people in one group and transforming them into analogous people in another group and testing to see if they're they're treated equitably across those two groups there's a lot of things that be hard to get right to make sure that the conversion process itself is fair and I don't think it's anywhere near something that we could actually use yet but if you could design that conversion process very carefully it might give you a way of doing audits where you say what if we took people from this group converted them into equivalent people in another group does the system actually treat them how it ought to that's also really interesting you know in a popular in popular press and in general in our imagination you think well gangs are able to generate data and use to think about deep fakes or being able to sort of maliciously generate data that fakes the identity of other people is this something of a concern to you is this something if you look 10 20 years into the future is that something that pops up in your work in the work of the community that's working on generating models I'm a lot less concerned about 20 years from now than the next few years I think there will be a kind of bumpy cultural transition as people encounter this idea that there can be very realistic videos and audio that aren't real I think 20 years from now people will mostly understand that you shouldn't believe something is real just because you saw a video of it people will expect to see that it's been cryptographically signed or or have some other mechanism to make them believe the the content is real there's already people working on this like there's a startup called true pic that provides a lot of mechanisms for authenticating that an image is real there they're maybe not quite up to having a state actor try to to evade their their verification techniques but it's something people are already working on and I think we'll get right eventually so you think authentication will will eventually went out so being able to authenticate that this is real and this is not yeah as opposed to gas just getting better and better or generative models being able to get better and better to where the nature of what is real I don't think we'll ever be able to look at the pixels of a photo and tell you for sure that it's real or not real and I think it would actually be somewhat dangerous to rely on that approach too much if you make a really good fake detector and then someone's able to fool your fake detector and your fake detector says this image is not fake then it's even more credible than if you've never made a fake detector in the first place what I do think we'll get to is systems that we can kind of use behind the scenes for to make estimates of what's going on and maybe not like use them in court for a definitive analysis I also think we will likely get better authentication systems where you know if a match every phone cryptographically signs everything that comes out of it you wouldn't go to conclusively tell that an image was real but you would be able to tell somebody who knew the appropriate private key for this phone was actually able to sign this image and upload it to this server at this timestamp so you could imagine maybe you make phones that have the private keys Hardware embedded in them if like a State Security Agency really wants to infiltrate the company they could probably you know plant a private key of their choice or break open the chip and learn the private key or something like that but it would make it a lot harder for an adversary with fewer resources to fake things most of us yeah okay okay so you mentioned the beer and the bar and the new ideas you were able to implement this or come up with this new idea pretty quickly and implement it pretty quickly do you think there are still many such groundbreaking ideas and deep learning that could be developed so quickly yeah I do think that there are a lot of ideas that can be developed really quickly guns were probably a little bit of an outlier on the whole like one-hour timescale right but just in terms of a like low resource ideas where you do something really different on the algorithm scale and get a big payback I think it's not as likely that you'll see that in terms of things like core machine learning technologies like a better classifier or a better reinforcement learning algorithm or a better generative model if I had the gun idea today it would be a lot harder to prove that it was useful than it was back in 2014 because I would need to get it running on something like image net or celibate high resolution you know those take a while to train you couldn't you couldn't train it in an hour and know that it was something really new and exciting back in 2014 shredding an amnesty was enough but there are other areas of machine learning where I think a new idea could actually be developed really quickly with low resources what's your intuition about what areas of machine learning are ripe for this yeah so I think fairness and interpretability our areas where we just really don't have any idea how anything should be done yet like for interpretability I don't think we even have the right definitions and even just defining a really useful concept you don't even need to run any experiments could have a huge impact on the field we've seen that for example in differential privacy that uh Cynthia Dworkin her collaborators made this technical definition of privacy where before a lot of things are really mushy and then with that definition you could actually design randomized algorithms for accessing databases and guarantee that they preserved individual people's privacy in a in like a mathematical quantitative sense right now we all talk a lot about how interpretable different machine learning algorithms are but it's really just people's opinion and everybody probably has a different idea of what interpretability means in their head if we could define some concept related to interpretability that's actually measurable that would be a huge leap forward even without a new algorithm that increases that quantity and also once once we had the definition of differential privacy it was fast to get the algorithms that guaranteed it so you could imagine once we have definitions of good concepts and interpretability we might be able to provide the algorithms that have the interpretability guarantees quickly to what do you think it takes to build a system with human level intelligence as we quickly venture into the philosophical so artificial general intelligence what do you think I I think that it definitely takes better environments than we currently have for training agents that we want them to have a really wide diversity of experiences I also think it's going to take really a lot of computation it's hard to imagine exactly how much so you're optimistic about simulation simulating a variety of environments is the path forward I think it's a necessary ingredient yeah I don't think that we're going to get to artificial general intelligence by training on fixed datasets or by thinking really hard about the problem I think that the the agent really needs to interact and have a variety of experiences within the same lifespan and today we have many different models that can each do one thing and we tend to train them on one data set or one RL environment sometimes they're actually papers about getting one set of parameters to perform well in many different RL environments but we don't really have anything like an agent that goes seamlessly from one type of experience to another and and really integrates all the different things that it does over the course of its life when we do see multi agent environments they tend to be there are so many multi environment agents they tend to be similar environments like all of them are playing like an action based video game we don't really have an agent that goes from you know playing a video game to like reading The Wall Street Journal to predicting how effective a molecule will be as a drug or something like that what do you think is a good test for intelligence in you view it's been a lot of benchmarks started with the with Alan Turing a natural conversation being good being a good benchmark for intelligence what what are what would you and good fellows sit back and be really damn impressed if a system was able to accomplish something that doesn't take a lot of glue from human engineers so imagine that instead of having to go to the CFR website and download CFR 10 and then write a Python script to parse it and all that you could just point an agent at the CFR 10 problem and it downloads and extracts the data and trains a model and starts giving you predictions I feel like something that doesn't need to have every step of the pipeline assembled for it it definitely understands what it's doing is Auto ml moving into that direction are you thinking wave and bigger autosomal has mostly been moving toward once we've built all the glue can the machine learning system to design the architecture really well so I'm we're saying like if something knows how to pre-process the data so that it successfully accomplishes the task then it would be very hard to argue that it doesn't truly understand the task in some fundamental sense and I don't necessarily know that that's like the philosophical definition of intelligence but that's something that would be really cool to build that would be really useful and would impress me and would convince me that we've made a step forward in real AI so you give it like the URL for Wikipedia and then next day expected to be able to solve CFR 10 or like you type in a paragraph explaining what you want it to do and it figures out what web searches it should run and downloads all the whole unnecessary ingredients so you have a very clear calm way of speaking no arms easy to edit I've seen comments for both you and I have been identified as both potentially being robots if you have to prove to the world that you are indeed human how would you do it but I can understand thinking that I'm a robot it's the flipside yeah touring test I think yeah yeah the proof prove your human test I mean I lecture so you have to is there something that's truly unique in your mind I suppose it doesn't go back to just natural language again just being able to so proving proving that I'm not a robot with today's technology yeah that's pretty straightforward too like my conversation today hasn't veered off into you know talking about the stock market or something because in my training data but I think it's more generally trying to prove that something is real from the content alone it was incredibly hard that's one of the main things I've gotten out of my can research that you can simulate almost anything and so you have to really step back to a separate channel to prove that slang is real so like I guess I should have had myself stamped on a blockchain when I was born or something but I didn't do that so according to my own research methodology there's just no way to know at this point so what last question problem stands all for you that you're really excited about challenging in the near future so I think resistance to adversarial examples figuring out how to make machine learning secure against an adversary who wants to interfere it in control with it is one of the most important things researchers today could solve in all domains in image language driving in I guess I'm most concerned about domains we haven't really encountered yet like like imagine twenty years from now when we're using advanced day eyes to do things we haven't even thought of yet like if you ask people what are the important problems in security of phones in in like 2002 I don't think we would have anticipated that we're using them for you know nearly as many things as we're using them for today I think it's going to be like that with AI that you can kind of try to speculate about where it's going but really the business opportunities that end up taking off would be hard to predict ahead of time well you can predict ahead of time is that almost anything you can do with machine learning you would like to make sure that people can't get it to do what they want rather than what you want just by showing it a funny QR code or a funny input pattern and you think that the set of methodology to do that can be bigger than you want domain and that's I think so yeah yeah like one methodology that I think is not not a specific methodology but like a category of solutions that I'm excited about today is making dynamic models that change every time they make a prediction so right now we tend to train models and then after they're trained we freeze them and we just use the same rule to classify everything that comes in from then on that's really a sitting duck from a security point of view if you always output the same answer for the same input then people can just run inputs through until they find a mistake that benefits them and then they use the same mistake over and over and over again I think having a model that updates its predictions so that it's harder to predict what you're going to get will make it harder for the for an adversary to really take control of the system and make it do what they want it to do yeah models that maintain a bit of a sense of mystery and bought them because they always keep changing yeah and thanks so much for talking today it was awesome thank you for coming in that's great to see you you
Info
Channel: Lex Fridman
Views: 164,645
Rating: 4.9513202 out of 5
Keywords:
Id: Z6rxFNMGdn0
Channel Id: undefined
Length: 68min 37sec (4117 seconds)
Published: Thu Apr 18 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.