Generative Deep Learning - The Key To Unlocking Artificial General Intelligence Meetup #LondonAI

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

my name is David Foster I'm the author of a book called generative deep learning I decided about a year and a half ago that I wanted to write a book about something I felt was going to be absolutely huge in the next year two years three years and that has certainly come true it's in the public eye more than ever now before I think with things like deep fakes with things like GPT - the model that was released by open AI - the text generation and what that could mean for things like fake news and it really is more prominent than ever and we have a couple of copies here this evening so get thinking of questions there'll be a giveaway at the end for the best question it's a first edition they're all first editions because I've written a second edition yet but if you put it on eBay you get more if you say it's a first edition so yeah cool so generously deep learning this book really came out of the the desire really to write about something that I think most of us is you know data scientists if that's what you do is your profession you don't get the chance perhaps to build generative deep learning models as part of your daily work you may build machine learning models for sure but things like deep learning and generative deep learning are kind of hobbyist topics at the moment and I think you know in years to come that may change but I wanted to write a book that was just about the pure fascination of building things that make machines human so whether that's text generation whether that's image generation or music generation this is a subject that just captures the imagination when I wasn't writing this book I'm the co-founder of a company called applied data science we're a london-based data science consultancy please do visit our website we're a DSP dot a I applied data science partners you'll see there some of the case studies of things that we do I know you're all on the Wi-Fi right now so do check it out and we basically we built bespoke data science solutions for companies so we're not a platform we just hire fantastic data scientists data engineers and data analysts and we build bespoke machine learning and data science solutions for companies so we are hiring as well so if any of you were looking for a move please do check out our website even more for that for those job specs so the subject of this talk really we go on a real expedition through generative deep learning right the way from first principles so we don't assume any kind of knowledge of what generative deep learning is all the way through to cutting edge gdl generative deep learning today and we'll cover these five topics with Python zero indexing there so in intro to generative deep learning we'll just cover what we're trying to achieve by doing this will then cover something called a variational also encoder which i think is a really great entry point into this subject you know it's very easy to get bogged down by ganz quite quickly if that's the first thing you come across with generative deep learning I'd recommend if you if you get getting started on this start with V AES we will cover Ganz as well in this talk and before moving on to something a bit more speculative we'll look at something called the world models paper by David Hart jurgen schmidhuber that came out in 2018 and this was a fascinating example of not just using generative deep learning for creation of something but actually using it within a reinforcement learning setting and they showed that actually it was possible for a machine to learn within its own dreams of what the environment might do in the future and that involved actually a variation or two encoder to do that so what we'll take a look at that and I want to finish with a kind of something's going to get you thinking on the way home on the tube about what this might mean for artificial general intelligence it's the subject everyone talks a lot about but it's so intangible we don't really know what this means at the moment but I think generative deep learning is something that we can get a grasp of and it's something that I think will form the basis of endeavors into this in the future and so we'll just talk speculatively about what that might mean so we're going to start with the intro to GDL well what is it well basically a generative deep learning model or generative modeling in general is where we're trying to create new examples of some training sets so you can see here down on the bottom left we might have a training set of images so from celeb a would be a typical example and then in this video here on the bottom rights we're seeing the output from a Gann a generative adversarial Network called style Gann which is trying to create new examples of something that may have come from this training set so see how this isn't a discriminative learning problem we're not trying to label images here we're trying to create new ones altogether which in itself is a much much harder problem as we will see so you know that this field is been evolving over some time but particularly in the last sort of five six years I would say since the invention of the can the the rate of progress is astonishing this image on the right is an actual you know real image generated by a machine of something that might come from the celebrity data set from style gam - which was released by Nvidia just I think late last year so late 2019 and so you can see here that the rate of progress is really astonishing and you know what we talked about generative modeling the obvious comparison to make it with discriminative modeling as I said where we're trying to put an emotion on an image for example so you would give the on the model this image it would pass through a convolutional neural network usually to produce in this case five possible responses which is it could be shock or happiness or anger and these would be numbers that the model is outputting right so that the whole point of doing this is that you want these numbers to be as accurate as possible and there's a well-defined metric that you can use to tell and that is because this label this data set is labeled you know from the training sets that there are a certain number of images that are happy faces there are a certain number that are angry and so on with generative modeling we don't really have that luxury generative modeling works on unlabeled data so you just have the images themselves and the model has to work out what it is about those images that makes them belong to that set independently of never having seen anything outside the sets so you can kind of think of it as everything as being labeled as this is in the set so finds to find me something else that would be belong to this set much more difficult problem so mathematically what we're trying to do here is we have this unlabeled image unlabeled data set on the left hand side we've just got two dimensions here just to kind of play with the toy example so X 1 X 2 and we're trying to find the underlying distribution of this data set so P of X and what we want to do with this P of X is sample from it we want to be able to say given a distribution and we all know how to sample from something like a normal distribution but we want to sample from this P of X distribution to generate a new data point inside of the set so it might say I think - no point six - not point eight belongs to this set again just to make the point the discriminative model is not trying to do this it is trying to predict P of Y given X so Y is our response column so the emotion of the image for example so what's the probability of a face being happy given this image and yeah and you can see here the difference is in the late in the training set is that the right hand set is labeled and the generative modeling data set is not labeled so let's play with this toy example so here we have a set of points that I have generated on the create according to a rule does anyone want to hazard a guess what the rule is first of always anyone seen this in the book because it's also in the book okay you're not allowed to answer that's great one person's got the book awesome great so yes anyone want to hazard a guess as to how this data set has been generated nope okay and that's no problem because it's quite difficult so this is a generative model that we could build to sample other data points from this two dimensional grid and you would be perfectly reasonable to suggest this as a generative model because we can sample from it very important for a generative model we can pick a data point within this box and never pick one outside the box so again mathematically what we're doing is putting a uniform distribution within the box and outside the box is zero probability so that's the generative model but as you can see here this is the true data generating distribution this is what we were trying to model and obviously I knew this up front because I am I'm playing God here and saying yes this is the truth nfj Turing generate data generating distribution but obviously in real life you don't know this you don't know what the true face generating distribution is and you can see there are some instances where the model gets it very wrong so for example like here this is a point that just isn't in the data generating distribution but our model is estimating that it is but equally there are some points of adjusters up in Alaska where the model says this is in the distribution but I'm our model the one that we have produced would never pick something up in this top left hand corner and what we're trying to do with generative modeling is make these two match as closely as possible so that ad model never produces something that the human eye would notice is outside the distribution so a face that looks not like a face and equally we're trying to produce something that captures every kind of face so it doesn't just produce one certain like male faces for example but also female faces if they are also in the data set and this is a really key sort of point to remember whatever generative modeling you're doing whether it's with text images or music that this is ultimately the whole point of doing the exercise why is it difficult well the problem is that you have this huge high dimensional data set in millions of dimensions not just the two that we've just seen but the fact that you've got maybe a thousand by a thousand pixels and for every single one of those pixels you need an RGB value so that's another three you're multiplying by and the fact is that we have to we have to find the needle in a haystack here because there's so many of these that are going to be obviously not faces and there are a tiny tiny tiny fraction that are so the kind of two problems that we come up against I've mentioned the second one that the world generated observations are incredibly sparse but also there's this complex dependency between features and features here are the individual pixels so how does the model know or how should it know that a pixel in the top right hand corner if that's green because they're on a green background that there that should be carried across to the other side of the image as well yeah so here's the problem right we've got this incredibly vast expanse of space in which to find true observations that were in the data set and also even when we find we think we've one that looks decent a human I would tell actually the the right ice brown and the the left eyes blue so I know this isn't true so it needs to find this very complex dependence across pixels and deep learning is really where we've excelled recently because this solves both of these problems or at least goes a long way to solving as we saw with style gang two so we're not going to start with Ganz we're going to start with variational two encoders and so hopefully by the end of this section you'll all be an expert as to how to build them what they're trying to achieve but more importantly actually why what why we're doing why we're taking this approach to you generative deep learning so I want you to look at this data set here this is a data set of cylinders obviously and I want you as humans to think how do I generalize what these cylinders are am I looking at the individual pixel values am i looking at the colors what am I trying to do here when I look at this data set and I would imagine that most of you have realized that there are kind of two features that are important here the two features are the height and the what the width which naturally is also the depth so we want our model to do this as well we want it to be able to look at this data set of cylinders and realize there are there is two dimensions in which they can be embedded and those two dimensions are as you can see on the horizontal axis here the width and on the vertical axis the height and crucially as we said earlier not only are these in able to be embedded into that space but equally any point in that space equals a cylinder that we haven't seen and that's what generative learning is all about it's about finding these data points that we haven't seen that still belong to the same set and what we're doing as humans then is is saying well we can encode those two numbers our height and our width into a picture of a cylinder and given some cylinder we can decode that cylinder into a two dimensional latent representation and I'll say that along a lot a lot of times in this talk latent representation is basically this lower dimensional of what the image is so in this case the latent space is two dimensions big height and width but in things like image face recognition and face recognition but face generation it can be maybe 200 dimensions big still is okay we take questions at the end actually just to cool things okay so this latent space let's imagine it's two dimensions so you can see here the cylinder on the left hand side being encoded into the latent space we then pick up the point in the latent space and we can decode those back into pixel space and what so what we need in this variation auto encoder is two models we need an encoder and we need a decoder both of these are neural networks I'm not gonna go into this into you how you train neural networks in this talk but you know the gist of it is you show it lots of examples and it back propagates any error through the network and over time the weights adjust to make less error and the error in this case is the difference between the image when it gets passed all the way through the network so we take this pixel image here we pass it all the way through the encoder so it's just now a 2-dimensional vector and we decode that vector back into pixel space and ask how similar are those two images and if it's doing its job well if the if the encoder is doing its job well it should be encoding what this image is into two dimensions so that the decoder can be like okay I can know what that is it's something that I decode to look like this and if it's not doing its job well then there's a disconnect between these two things and the motor will train itself to be better over time so the loss right here might be something like root mean square error between the two images where you can just simply take the the pixel values of individual pixels and ask you know what's the RMS see between those two and you want to get that as low as possible over time so you train this over many epochs by showing it images from the detector dataset and you notice there's no label here because you're just training it on the image itself it's kind of it's learning just from the the the unlabeled data and what you get here on the if you look on the the right-hand side actually actually it's not on the left so that this is a data set that you've probably seen before if you've done any kind of deep learning it's everywhere it's called the endless data set and when you train a variable to encoder on these to produce things that look like numbers still and you encode everything in the in the data set you get something that looks like this so you it's in two dimensions because we're encoding into a two-dimensional latent space and every single point here is an image and what I've done is colored these by the the label that was in the MDS data set we haven't trained on that label it's just being used to color the image and you can see here quite cleverly what the encoder has done is try to separate out things that look the same so all of the ones are kind of grouped together all of the sixers are grouped together all of the twos are grouped together and it's doing that to give the decoder an easier job the other end because what it's got to do is take one of these points and try to reproduce the image but there are kind of two problems that this has because this is this is actually not a variation or encoder this is an auto encoder and auto-encoders have the problem that the sample space is really poorly defined first of all if you were to pick a new point in this space am I allowed to pick a hundreds minus fifty and if not why not there's no there's nothing stopping me from picking that point and if I did pick that point in the latent space am I guarantee that it's going to decode to something sensible autoencoders don't really answer that question and you can see here as well like if I just take three points at random in this space and ask the decoder to decode them you get some sort of fuzziness where there's no real continuity between points in the space and that's because you know if you ask the if you ask the decoder to go from this orange region here and decode points to the blue region is there anything really to tell it that it needs to kind of move smoothly between those two points and generally merge say a 1 into a 7 not really and the problem with this is we need we need it to be able to do this because if we're going to produce something like style Gann where it's merging between facial images then we don't want it to be kind of discontinuous halfway through and produce complete noise because that's the point that we may sample as a face so we need this local continuity number one and we need the sample space to be well defined so that we can just apply justifiably you have a region into which we can sample and not have this problem of the infinite dimensions in either direction solutions the variation or to encoder so this came in surprisingly recently I mean you know all of these things are recent in sort of the grand scheme of things but this was the kind of one of the sparks that generated the revolution in generative deep learning what they realized was that actually if we include another term in the lost function called the KL divergence which our previous speakers already mentioned actually what this does is it basically says instead of mapping a point instead of mapping an image to one single point in the latent space what we do instead is map it to a normal distribution with a mean and a standard deviation in the latent space so you can think of this as like a fuzzy region now that's being mapped to in the latent space and we want that to be as close to a standard normal as possible standard normal being zero mean and standard deviation of one and why do we want to do that well firstly it's solves the problem the sampling problem that we mentioned before now we can just sample from standard normals in order to generate new points we don't have this problem of the the infinite space that we could possibly sample from we have a well-defined known distribution the normal distribution that we can sample from when we push something through the decoder and secondly it solves the local continuity problem because now when the decoder samples the points it could be anywhere in that region around the point that the encoder has pushed the point to and that's really important because this fuzziness creates it almost creates the the necessity for the decoder to be good not just a decoding at this point but things are they need to also be decoded to something similar and that's really important and and that was the key really to variation water encoders becoming the one of the first examples of kind of things that got spookily good at things like generating phases and you can see here as I mentioned the loss function is basically a sum now of the root mean square error and the KL divergence and this K or divergence just says make sure that things are very close to the standard normal distribution and what do we get now if we look at this this scatter plot of the encoded points well everything is around the zero one standard normal which is exactly what we want everything is sort of being crushed into near the origin and so that you can sample from this two dimensional normal distribution and get something that is very very close to what a digit might look like and you can kind of move a little bit to the left and it will produce something that's a little bit different not completely different which is exactly what we want and you can see how a nose is densely populated this if we take the you know kind of the Z values and P values this is very densely populated and we don't have these vast areas of white space that we had before yes so this is kind of a one example of how a variation or two encoder might be used to generate digits let's just take a look actually if we imagine sort of my finger is moving around the latent space this is what the decoder is decoding points T and you can imagine actually you know some of these might look a bit weird but in a different kind of imagine you know a numeric system developed differently then we may have some of these symbols appearing as digits they are there are viable digit looking things even though they're not actual numbers and that's what's important we don't have these sort of weird disconnects between between points in the pixel space for example cool so let's move on from digits because it's a bit of a boring data set and we can look at images instead so we can train exactly the same model on a cell of a data set this is something you can do in the book all of the examples are there for you to build these sort of things at home this is just a couple of epochs of training it's you know it gets better obviously over time but I just wanted to show you how quickly you can get set up with this kind of thing and build things that actually look fairly sort of impressive on your on your laptop so yeah these are all examples of faces that have been generated some obviously better than others but these are just completely picked at random and these are yeah these are where I just sample a point in the normal distribution so it will you know pick a point somewhere around the origin decode that and you get a face book and if you change the point you get a different face and it's quite sort of fun and fun to play around with one of the other fun things you can do is various short two encoders is you can do latent space arithmetic so this is where you there's a few things you can do so you can add and subtract properties for one thing so you take a point you by encoding an image into the latent space so now we've got to say if we're in two dimensions we've got a two dimensional vector and we calculate the sunglasses vector say which is where we take every image in our data set with sunglasses on and we take every image without sunglasses on and subtract those two into in the latent space so you now have a vector along which you can travel to add sunglasses to any image so that's let's say this is the vector that was calculated what it will do is it will take the encoded image here add sunglasses to it and then you can decode that image into someone that has sunglasses on and you can see that I you know I've calculated here some other vectors so the smiling vector the blonde vector the male vector and the sunglasses vector and just by moving along this vector more and more and more you can change the input image along a certain property and again you don't even need like huge amounts of compute to build something like this this is all just done on my laptop with CPUs and stuff but obviously GPUs would speed up the training but you know the underlying technology is exactly the same we're just taking we're just doing good old-fashioned kind of arithmetic in some sorts of sample space and you get amazing things like this so another thing you can do is you can interpolate between two images so exactly the same idea encode two images into the latent space and you have some alpha along this vector that you travel between zero and one decode halfway along that point and you get a merge between the two images so like you could merge I don't know who we've got here we've got Prince Charles's wife over there and a lady on the left hand side and you blend the two images so that one slowly morphs into the other that's quite fun and then the last thing you can do here is you can build a model to generate Lego heads which somebody did the other day this is a blog post from some chat but please do check it out it looks really fun and a good example of you know with a really nice data set the set of data on Lego heads you can basically train a Verya should also encoder to do exactly this and build your own examples of what a new Lego head might look like right okay so that is variation autoencoders and it's a great way to get started with generative deep learning the thing that most people think about when they think of GDL is ganz it's probably the best example these days of how to build really sophisticated generativity learning models especially when images are concerned and we're going to take a little look about how they differ from VI es because they're not actually that different so we're going to start with the BAE as we've just seen this is exactly the same architecture and we'll ask the question what if we remove the encoder and we're going to build a separate model to predict if the decoder is generating things that are real or fake so it's a Productivity of an image is real or fake I should say so it's fake when it's generated by the decoder and it's real if it comes from our data set so that is the that is what we're doing here I'm going to play that again it took me so long to build that slide transition it's worth a second look so see here this is the encoder should have done yeah you're right that was a few years away probably so you can see here the encoder we're actually taking this and converting it into what is called in the gang framework a discriminator because what its job is now is not to convert something into a latent represent but it's to output a single number which is how real it believes this image is so is it something from the training set or is it something that the decoder has produced the generator is identical to the decoder in a VA II so there's very little between them really and people always think of Ganz's like really really difficult and VA uses the kind of like simple you know younger brother but it's not that case at all that they're actually both quite simple models at hearts it's just about how you think about them and how you really understand what they're doing so this is again the the real difference really with v's is that the VA ii model is connected in the middle you know there's a latent space right in the middle there that data points get passed through the images get passed through whereas the Gann is two separate networks that you have to train independently so let's see how we do that so we let's talk about training the discriminator first and remember it's job is to discriminate between real and fake images so what we do is we generate a batch of images from the decoder from the generator I should say in this framework and of course these to begin with are going to be awful because it has no idea how to generate faces but we're seeing here like maybe you know three quarters of the way through the training process where it's generating something that's pretty good and we're going to mix them with some real images in the the training set that we have from the celeb a training set and then we pass them all through the discriminator and ask it what do you think is real and what do you think is fake so there's it's predictions and there's the target that is actually the truth the ground truth this is just like you know regular machine learning discriminative modeling and the loss there is something like cross-entropy loss where we we take the prediction and we take the target and we have a metric that says how close are those two things together so we don't technically you don't usually use something like root mean squared error here because this is a binary response column you know cross entropy is the binary cross entropy as it is a good choice for the loss metric so you can see there it's done pretty well at the top because it's predicted this one is not very real but it's got this one really wrong and it says actually I thought this was a real face because it predicted at no point nine but it was actually zero so it's a high loss and you know you do that again to get in again and you you train a discriminator that's how that works training the generator is a little bit more tricky because what we have to do is first of all generate a batch of images and then pass this through the discriminator to get out this number here what the discriminator thinks these generated images look like but now notice that instead of a target of zero which is what we were using for the the training of the discriminator because we wanted it to spot these were fake we actually now say the target here is one because we want this generator to generate things that are more likely to be to be true to be true images so when we back propagate these errors we must freeze the weights of this group discriminator because otherwise the discriminate is going to start training against this target variable and we don't want it to do that we want it to train on you know in its training process this is just for training the generator but we need to use the discriminator as a mechanism for generating numbers that it can that the generator can be trained on so these are the two generating process these are the two training processes and what we do is we just iterate them so you train the discriminator for a bit then you train the generator for a bit then the discriminator then the generator and you literally just play them off against each other and you say to the generator you say to the discriminator okay get better at discriminating these terrible images from these real ones and it gets the hang of that and then you tell the generator we'll do a better job because the discriminate is now pretty good at spotting your mistakes so then the generator gets better and then the discriminator has a harder time and it's like two adversaries which is where the name comes from playing off against each other and slowly getting better over time and that's it that's all there is to training generate mi40 networks there's obviously a number of ways that the training process has been enhanced and develops and improved both in terms of the stability and the speed at which they're trained and you know there's lots of information in the book actually about you know things like the fascist ein gang which is one of the first ways that the stupid he was improved and WP gam which is the sort of state-of-the-art now way of training it again and you can do similar things that you can do with the AES so you can use them to generate new faces you can see the quality of the faces here is incredible really you know and this is this is an even state of the art these days this is just style gann so not even the second variation of it so you can do that as you could before that you generally get sharper images with Gans than you do with VA ease they tend to be a bit blurry there's number of reasons for that the other thing you can do I've got it here but obviously like you can do the same sort of playing around in the Layton space then then you could with very short to encode this as well so I you know encourage you to play around with it and just find out for yourself what these things can do okay I want to very quickly now cover a topic called world models and so we're going to go back to the the VA ease that we saw earlier because we want to ask the question well can we use these in another context other than just generating cool faces because it's nice by like how can we actually use this in a practical sense and the answer is yes so you can see here this is something called reinforcement learning where we're trying to train an agent to achieve a task in an environment and what happens is the agent performs actions so in this case it might be driving left right or pressing the accelerator and we give the actions the environment the environment then says okay I'm going to give you a reward if you do something well and I'm going to give you what the new state is that you have to use to produce your next action and so the loop continues and there's obviously tons of techniques out there for training reinforcement learning algorithms but what was impressive is how they applied generative deep learning to reinforcement letting setting so what so I'm going to take you through this diagram this is basically a very stripped down version of what they did you can see here this is the same you know diagram that we've just seen where the environment gives a state back to the model and the model is ultimately giving an action back to the environments but right in the middle here is a variation auto-encoder and what is doing is its mapping this image into a latent space and then the world model its job is to understand how that latent space evolves over time and it's much easier to understand how the latent space evolves then it is the pixel space because the pixel space is full of noise it's full of things like this this barrier here that really doesn't do anything in the environment it's full of you know these green pixels which actually mean nothing to this this car all it cares about is the kind of the road immediately in front of it and so what they realize is that you know they can build a latent space that ultimately the world model learns to model over time and understand how it might evolve given an action that it is just taken which is what this dotted line here is at the bottom and then there use some other techniques like they used evolutionary algorithms to to build this controller over here which then says well given what the world model is telling me how should I translate that into an action so that's all very good this is the so just to kind of give you an idea about what this is doing this is the output from the variation or two encoder once it's been trained so you map here into it I think they use thirty-two dimensions and then remap back into the pixel space and you can see here the two are very closely aligned which means that with just 32 numbers they could tell you pretty much exactly what the state of the environment was as opposed to the thousands and thousands of numbers that this pixel space represents because it's in a huge kind of grid and there's three you know RGB channels for example and the other thing you can do obviously is play around with this latent space like we saw before so you take the Z which is the the hidden representation tweak some of the variables and you can create new tracks that may exist in in the world so you can see here just playing around with a few bits you can create bends so what's important to understand here is that the model understands what isn't isn't possible in this world in terms of states that it might come up against and so all it needs to do now is understand how the Z evolves over time it doesn't really need to know anything about what the environment looks like it's just its whole understanding of the world is in this late in space and the yeah so the the the agent learns a world model of how the latent space evolves given an action but the the crucial step and this is where the paper kind of made progress on what had been done before is that they realized that actually you could replace the entire environment with a copy of the trained world model and then train the environment within its own dream of how the environment involves rather than actually asking the environment what happened that's the real magic of this this is so this is an example here of where you've used the environment to kind of get a feel of like the physics and what happens but not to train it on any specific task so when you're when you're training the you know the world model you're basically saying just play around and see how things evolve much like you know a child might do knocks into staff realizes that you know certain things cause it to go off the tracks or certain things cause it to spin around in a circle and not really get very far and so on once it has that world model understanding you can then say now train yourself to go as fast as possible around the track and it's like oh yeah okay I understand because now if I press the accelerator I know that I go forward so I don't need the environment anymore to give me feedback because I've learned that so you can actually just replace the entire environment with its own world model this is what it means by learning in its own in its own dreams even more so this is actually it learning here so once you've trained it you can see here this is what it might imagine is coming up in the future and it's taking action so you can see there it turned left none of this exists we're not using the environment for any of these reconstructions it's taking this said imagining how it evolves over time and just throwing this back out through the decoder so that our sessions can see what's going on it doesn't need any of these images it's just understanding how the Layton's evolves and giving itself rewards as well and that's really important it learns not only how the environment evolves it also learns well what reward would I get if I did this because it's learned that from the environment too and amazingly this is not just like kind of as good as stuff that's come before it's better if you learn in the dream you achieve state of the art much quicker and in fact even better that has previously been achieved this is the results straight from the paper quite amazing that it's you know it achieves almost a perfect score having learned never in the environment but only in its own dreams about how to take on this task now why does this help AGI artificial general intelligence so let's just sort of imagine what this might mean for the future and why this is important first of all I think what's important is to start from kind of ground principles and ask you know what are we doing at the moment in reinforcement learning well what we're doing is we're generally asking the question how do you rewards that the environment gives back converts into actions that I should do and in supervised learning another massive field of machine learning we're asking the question how do labels that I give the model how should I convert those into predictions of things I haven't seen before but and this is where I think this is what I've been thinking about and I encourage you to as well is do any of these things really exist in the world that we put an invite an agent into or is it something we have to give it and the more I think about this and more I think actually you did the idea of labels and actions and rewards in an environment they don't exist nature isn't so kind as to tell us what we need to do and that's quite sort of like profound to sort of think about you know things like that but when you sort of start thinking about it you realize more and more that the only thing that the agent has to work with is this field of data that it exists within and it might have the only thing I can think of that an agent could do is to to imagine how that data might evolve over time and perhaps if agents get good at that then they naturally are better at doing things in the environment that make them stay around longer and that's the kind of key idea here is that that instead of giving an agent a task that we as kind of the humans overlords kind of give it to do we need to find a way of building agents and environments that are that are free for the agent to explore and explore generatively and understand how how this in how this environment of all over time given things that it might do in that environment and and I think what would happen if you did that is that over time with a powerful enough agent is that it would realize that it exists in the environment and there's a direct correlation between things that it's seeing itself do and things that happen and that's that should come about not through us telling it that that's what it needs to look for but because that's something that it finds intuitively that it needs to explore more about because it's something that it doesn't understand now so this idea that that agents need to be inquisitive is something that has come into the literature a lot recently and they're actually training agents to simply be inquisitive is enough for them to start displaying intelligence and this is an example of kind of three of the ways that generative modeling is starting to creep not only into the machine learning literature but also into psychology literature the idea that the brain may just be one big whether it's generative adversarial network or whether it's some form of generative model that's just looking at things and trying to imagine what might happen into the future because that's what it needs to be good at and is something that lots of people are exploring now and it's creeping more and more into people's ideas about what what machine learning should really be trying to achieve and we shouldn't be putting too much structure on that so you know what what might this mean and how does this work in practice well you know again this is kind of my own thoughts and feel free to have to think about this on the tube on the way home and it's quite sort of fun to think about how this might happen well what about if we could train generative models that the the agent builds up about its environment just like the the car example that we saw before but instead of instead of its sole goal being to solve just one task that it creates its own tasks to solve and that would mean it has to create a reward system so a function that takes things like the state and converts them into her into a reward that it wants to learn not because the environment has told it to learn that or we have told it but it wants to and also you know to understand what it what he wants to understand as an action what it wants to understand as a label I think that's that's the way that you know machine learning needs to go you know if we're gonna build something intelligent of multiple tasks obviously this is a field that's very very nascent and and naive at the moment but I think we need to start thinking about this as data scientists if we really want to progress AGI I just want you to finish up with this video have a look at this agent in this environment and ask yourself where is the reward coming from okay so all it's done is just put a block on top of another block and there seems to be some sort of reward function there that's kicking in quite spectacularly yeah yeah well exactly dope dopamine ism yeah yeah and I think a really interesting point on that is how does it control dopamine to achieve what it wants to achieve rather than the environment telling it how to how to use its own dopamine if you like so I mean there's obviously some sort of over the overdrive of enjoyment here just from we need to build things that are just intuitively know that it needs to it needs to explore the world more in order to create situations like this a great way to summarize it you know Richard Fineman one of my own personal kind of heroes said what I cannot create I do not understand I think equally is true is what AI cannot create AI doesn't understand and that's the end of my talk thank you very much you you

Info

Channel: H2O.ai

Views: 823

Rating: 4.5555553 out of 5

Keywords:

Id: yC-uzjhxgIw

Channel Id: undefined

Length: 44min 15sec (2655 seconds)

Published: Tue Mar 17 2020