A Friendly Introduction to Generative Adversarial Networks (GANs)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

This is great, I just got done watching it for my lunch break and it's very easy to follow. Thanks for posting!

👍︎︎ 1 👤︎︎ u/Lord_Mick_Z 📅︎︎ Aug 14 2020 🗫︎ replies
Captions
[Music] hello i'm luis serrano and this video is about generative adversarial networks or Ganz for short develop IDN good fellow in our researches in montreal ganz are a great advance in machine learning and they have numerous applications perhaps not the fanciest applications of Ganz's face generation if you go to this page this person does not exist calm you'll see it in action these images you see here are of people who don't exist they have been fully generated by a neural network this is fascinating considering how detailed our expressions are in this video we learn how to generate faces with again in a very simple way and great news for those of you who like to sing along to the videos as you type code in this video we'll be coding a pair of one layer neural networks which will generate some very simple images and it's all in github under this repo which you can find in the links but if you don't want to write code this video is for you as well since we develop the intuition and the equations by hand as well first let me give you a general idea of what ganzar ganz consists of a pair of neural networks that fight with each other one is called a generator and the other one the discriminator and they behave a lot like a counter feeder and a cup the counter feeder is constantly trying to make fake paintings and the cop is constantly trying to catch the counter feeder the counter feeder is a generator and the cop is a discriminator so as he gets caught by the cop the counter feeder keeps improving and improving his paintings until one day he learns to finally paint a perfect one that completely fools the cop a neural network language what happens is that we have a pair of neural networks the generator and the discriminator and we train them with a set of real images and a set of fake images generated by the generator the discriminator is tend to identify which images are real or which ones come from the generator and the generator is strained to fool the discriminator into classifying its images as real images in this video we'll build a very very simple pair of guns so simple that we'll be able to code them straight in Python without any deep learning packages this is our setting we live in a world called slanted land where everybody looks slightly elongated and walks at a 45 degree angle this world has technology but not as developed as ours they have computer screens which display images but the best resolution they've been able to work out is 2 times 2 so they have black and white 2 pixels by 2 pixel screens also they have developed neural networks but only very simple ones as a matter of fact they only know neural networks of one layer but I will show you that in this simple world we can still create a pair of gas that will generate faces of the people who live in slanted land here are four faces of people who live in slanted land notice that everybody is elongated and tilted 45 degrees to the left since the screens are only 2 pixels by 2 pixels this is how the pictures of the people look in the screen they SMO sucked like a diagonal or a backward slash and this is how noisy images look notice that these are not faces since they don't look like a backlash and these are mostly generated randomly so the goal for networks is to be able to distinguish faces like these from noisy images or non faces like these ones let's attach some numbers here we'll have a scale where a white pixel has values 0 and a black pixel has a value of 1 you may have seen this in the opposite way but we'll do it like this for clarity this way we can attach a value to every pixel in the 2 x 2 screens ok we're ready to build our networks first I'll build them by hand and then I'll show you how to get the computer to train them let's start by building the discriminator the first question is how are we gonna tell faces apart from non faces easy notice that in the faces the top left and the bottom right corners have large values because or pixels are dark whereas the other two corners have small values because their pixels are light on the other hand in noisy images anything can happen therefore the way to tell faces apart is by adding the two values corresponding to the top left and bottom right corners and subtracting the values corresponding to the other two corners and faces this result will be very high whereas in noisy images it will be low for example for this face the value is 2 and the value for the noise image is minus 0.5 we can add a cutoff or a threshold of say 1 and say that any image that's course 1 or higher is a face and any image 2 is course less than 1 is not a face or it's a noisy image a neural network lingo this is how things look the values of our 4 pixels get multiplied by plus or minus 1 depending on what diagonal they are on and then we subtract a total value of 1 or the bias we add these 4 numbers and if the score is 1 or more than the image is classified as a face and if it's less than 1 that is classified as another face in this case the images face because it gets the score of 1 we can also put the probability that something is a face by using the sigmoid function we apply the sigmoid which is this function that sends high numbers to numbers close to 1 and low numbers the numbers close to 0 and we get sigmoid of 1 which is 0.73 the discriminator network then assigns to this image a probability of 73% that it is a face since it is a high probability greater than 50% we conclude that the discriminator thinks that the image is a face which is correct notice that in this neural network the thick edges are positive and that thin ones are negative and this is a convention we'll be using throughout this video now if we enter the second image which is not a face into a discriminator then we do the same calculation we get negative 0.5 for the score sigmoid of negative 0.5 is 0.37 this is lower than 50% so we conclude that this discriminator thinks that this image is not a face the discriminator is correct again now in the same way let's build a generator this is a neural network that will output faces to build a generator will again take into account that along faces these two corners are high value these two corners are low value whereas in noisy images or non faces anything can happen so this is how the generator works first we start by picking an input set which is a random number between 0 & 1 in this case let's say will be 0.7 in general the input will be a vector that comes from some fixed distribution now let's build a neural network what we really want is to have some large values and some small ones the larger ones are represented by thick edges and the small ones by thin edges because remember that we want large values for the top left and bottom right corner and small values for the top right and bottom left corner so since the top output has to be large we want these weights coming in to be large so let's make them plus 1 now what do we get for the output here well first we're gonna get a score of plus 1 times 0.7 plus 1 which is 1 point 7 now let's look at the second note it corresponds to the top-right corner so it has to be a small value so let's put in negative numbers here what do we get for the score we get minus 1 times 0.7 minus 1 which is minus 1.7 in the same way we want a small value here so we put weights of minus 1 and we get minus 1.7 and for the last one we're on a higher value so we put plus ones and we get plus one times zero point seven plus one which is one point seven now those are just the scores we need to apply sigmoid to find the probabilities so we apply the sigmoid and we get 0.85 0.15 0.15 10 0.85 those are the values that will go in our pixels and notice that our image looks like a diagonal which is how we define our faces noticed by the way we built it this neural leopard will always generate large values for the top left and bottom right corners and small values for the top right and bottom left corners no matter what values that we input because remember that is between 0 & 1 therefore this neural network will always generate a face that means it's a good generator of course we built this neural networks by eyeballing the weights but that's not how it's normally done in general we have to train the neural networks to get the best possible weights for this let me tell you a little bit about error functions and error or cost function is a way to tell the network how it's doing in order for it to improve if the error is large then the network is not doing well and it needs to reduce the error to improve the error function that we'll use to train these games is the log loss which is the natural logarithm this is the logarithm base e Y the logarithm well a logarithm appears a lot of error functions from many deeper reasons which will not cover but we can think of it for now so very convenient function let's first say that we have a label of 1 and our neural network predicted is as 0.1 this is a bad prediction and we should produce a large error because 0.1 is very far from what on the other hand it was label is 1 and the prediction is 0.9 then that's a good prediction because the prediction is very close to the label so this should produce a small error how then can we find a formula for this error well notice that the negative natural logarithm of 0.1 is 2.3 which is large while the negative logarithm of 0.9 is 0.1 which is low as a matter of fact the closer number is to 1 the smaller is negative logarithm gets therefore when the label is 1 the function negative logarithm of the prediction is a good error function now let's go in the other extreme when the label is 0 in this case if we had a prediction of 0.1 it would be good because it's close to the label therefore the error should be small on the other hand a prediction of 0.9 is terrible so the error should be large the function that we need here is similar to the previous one with it with slight difference it is the negative logarithm of 1 minus the prediction notice that in the first case the error is the negative logarithm of 0.9 which is 0.1 and in the second case is the negative logarithm is your appointment 1 which is 2 points to be and this match is what we want it which is that the first error is small the second is large to summarize if the label is 1 which means we want the prediction to be 1 we define the law class to be negative logarithm of the prediction notice that from the graph of the negative logarithm of X on the right this is big when the prediction is close to 0 and small when the prediction is close to 1 and then when the label is 0 which means I want the prediction to be close to 0 then we use as an error function the negative logarithm of 1 minus the prediction from the graph of negative logarithm of 1 minus X in the right we see that when the prediction is close to zero there's a low error and when the prediction is close to 1 there is a high error so these two are the error functions that we're going to use for training the generator and the discriminator based on if we want the prediction to be 0 or 1 now that we have our functions we get to the meat of our training process which is back propagation I will explain it very briefly the way we train your networks by first taking a data point and performing a forward pass calculating the prediction and then calculating the error based on the log loss that we previously defined then we proceed to take the derivative of the error based on all the weights using the chain rule this will tell us how much to obtain each weight in order to best decrease the error the way we do the error using a derivative is a process called gradient descent long story short what you do is you plot the arrows back to the then calculate gradient which is the direction of greatest growth and then take a tiny step in the direction of the negative of this gradient in order to find new parameters that decrease this error as much as possible now we're ready to train the generator and the discriminator and what we have to do is put the right error functions on the right places here's our parent neural network and notice that the weights are not yet defined so we start by defining them as random in numbers now we select set some random number between 0 and 1 which is going to serve as the input to the generator we do a forward pass of the generator to obtain some image which is probably in our face since the weights are random this is gonna be our generated image now we pass that generated image through the discriminator so the discriminator can tell if it's fake or not the discriminator outputs a probability so let's say that it's for example 0.68 now pay close attention because this is where the rubber meets the rope this is where we define the correct error functions first let's think what does the discriminator want to do here in other words if the discriminator was great what should it output well since the image is not a face but it's a generated image from the generator then the discriminator should be saying that it's fake that means that the discriminator should be outputting a 0 if we remember the error function the way we measure an error when we want the neural network to output a 0 is the negative logarithm of 1 minus the predation this is an error that will help us train the discriminator now that we've figured out what the discriminate at once let's go to the generator what does the generator want well the generators wildest dreams are to generate an image so good so real that the discriminator classifies it as real therefore the generator wants this whole neural network the connection of the two to output a 1 that means that the error function from the generator is negative logarithm of the prediction so that is the error function that will help us train the weights of the generator in other words if G of Z is the output of the generator and D F G of Z is the output of the discriminator then the error function for the generator is negative logarithm of D of G of Z and the error function from the discriminator is negative logarithm of 1 minus D of G of Z the derivatives of these two are what will help us update the weights of both neural networks in order to improve that particular prediction notice that these two error functions fight against each other but that is okay because the read error function only changes the generator weights and the blue error function only changes the discriminator weights therefore they don't collide they simply improve both neural networks as one to produce different outputs which is fascinating so what we have to do now is repeat this process many times we pick a random value for Z we apply the generator to produce a fake image apply the discriminator to that image and use back propagation to update the weights on both the generator and the discriminator then we take a real image plug it into a discriminator on updated weights again using back propagation so what happens after many of these iterations or epochs well we left the training in the notebook and we got these values for the weights to make it easier I've drawn thick edges for the positive weights and thin edges for the ways that are negative or zero feel free to pause the video and convince yourself that these two neural networks actually work really well both the generator for generating realistic with them faces and the discriminator for being able to tell apart the faces from the non-faces and the reason for this is that if we remember from the beginning of the video the top left and bottom right corners off a face should be big and the other two should be small so let's just convince ourselves let's look at the generator notice that since the input is between 0 & 1 and the tube top edges are positive then the sigmoid value of this output is large which is the value this pixel over here therefore that pixel has a large value similarly these 2 values are also positive and they give us a large value for this pixel over here now these two over here are negative so they give us a small value for this pixel and these 2 over here negative so they give us a small value for this one and therefore our image looks a lot like a diagonal and given the resolution of the screens and slanted land that are all two pictures by two pixels and this picture looks and candidly like this resident of slanted land over here thus we have built a network that generates faces now as promised here's the code for you to sing along to the video it's in this repo called Yuans and they're my github first we have the faces that we hard-code and the random noise the images that we generate that are not necessarily faces then we also develop the derivatives carefully for the discriminator based on faces then for the discriminator based on noisy images both with respect to the weights and respect to the biases these are all coded in this discriminated class we also work out the derivatives corresponding to the error functions for the generator again with respect to the weights and the bias and these are all coded in the generated class we also have error function plots we can plot the error function for the generator and the error function for the discriminator notice that the generator error function goes down and stabilizes but since the generator and so following the discriminator then the discriminative function doesn't do so well and actually goes up at the end and finally we ask our generator to generate some random images and here they are notice that they all look like faces and slanted land which is what we wanted from the beginning therefore we have successfully created a pair of gas that generate faces in slanted lead now time for some acknowledgments this video would not be the same if not for the help of my friend so a big thanks to Diego Sahil and Alejandra who helped me in various ways either encourage me to learn Gans more seriously or helping me with endless questions or gave me great feedback on my code Diego in particular has a great series of blog posts where he cold scans from scratch and by torch and tensor flow I use them as an inspiration for this video so I highly recommend them and that's all for today I hope you've enjoyed it I'd like to remind you that I have a machine-learning book called rocking machine learning in which I explain the concepts of machine learning and down-to-earth way with real examples for everybody to understand in the description you can find the link to the book and a very special 40% discount for the viewers of this channel and as usual if you enjoyed this video please subscribe to my channel for more content or hit like or share amongst your friends and feel free write comment I really enjoy reading your comment especially those with suggestions for future topics and if you'd like to tweet at me my twitter handle is luis likes math all the informations videos writings etc can be found at this link serrano academy so check it out thank you very much for your attention and see you in the next video you
Info
Channel: Luis Serrano
Views: 84,930
Rating: 4.9694753 out of 5
Keywords:
Id: 8L11aMN5KY8
Channel Id: undefined
Length: 21min 0sec (1260 seconds)
Published: Tue May 05 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.