DCGAN implementation from scratch

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we will implement dc gan from scratch and train it to generate images like the ones you see in front of you right now [Music] [Music] everything that we're going to implement in this video originates from this paper which is a very important paper in sort of the development of gans and essentially what they did is that they use deep convolutional neural networks instead of the fully connected ones that we implemented in the last video and with the modifications that they did in this paper they were able to get you know really um much better quality than than what was attained previously so we're not going to read through this paper but i'm going to go to the most important part that we're going to sort of need for the implementation part so first of all or actually let's look at sort of the figure first so this is what the the generator looks like right it takes in some noise in this case a hundred dimensional uh vector uh and then it uses um com transpose 2d so essentially you know sort of the opposite of a of a com layer and it instead upscales the image so that it first up scales it to 1024 channels and then four by four then it upscales to eight by eight uh 512 channels and it sort of keeps doing that until it obtains 64 by 64 and then three rgb channels and so the the images that we're working with are you know as they say here as i say here 64 by 64 pixel images all right so what they used uh in that network by the way this is the generator and they use the exact same just opposite approach using com layers for the discriminator so we're going to go through all of that and i'm going to implement it from scratch so we're going to see that but the most important things here is to understand um the guidelines perhaps of what they did since gans are incredibly sensitive to epi parameters and we saw that in the last video as well then these guidelines are very important to make sure that the gan is actually stable so one thing they did is that they removed or they didn't use any pulling layers so no max pool no average pull and instead just used convolution straight you know straight through just only convolutions so no linear layers as well uh which is this at this point remove fully connected hidden layers and then they also use batch norm uh in both the generator and the discriminator uh they use the relative activation function so reload activation function for all layers except for the output which uses the tan h so this is the yeah so this is the rally activation in the generator and then for the discriminator they used leaky relu and some details on that they explain here so i recommend if you really want to understand all of or get all the details then do read through this paper it's going to be in the description but anyways uh they used mini batch size of 128 they initialized all of the weights of the network with a normal distribution with mean 0 and standard division 0.2 in the leak reload the slope was 0.2 and then another thing is that when they used atom optimizer they used a learning rate of 2e minus 4 so for the beta parameters of atom and this if you're familiar with atom we have two beta values beta 1 is essentially for a momentum term and then beta 2 is for an exponential average and so what they do is that they instead changed the standard value of 0.9 and they reduced it to 0.5 which stabilized training so i think all of those things are what we need to have read before so that now when we actually implement this you'll you'll understand where these numbers actually come from they're all taken from this paper all right so what i'm going to do is i'm going to have one file which i'm going to call model.py and here we're going to implement the discriminator and the generator and then in the training function right here we're going to implement all of the training setup all right so starting with implementing the the model the discriminator and the generator we're going to do class discriminator and we're going to inherit from anon module uh we're going to do our init self and as input to the discriminator we're going to take some images and we're going to take some features d and um so this features d is gonna just be the channels that are going to change um as we go through the layers of the discriminator so then we're gonna do super of discriminator uh self and then dot init and uh one thing we're going to do actually before we do the init is we're going to define a block so the block is going to take some in channels it's going to take some out channels some kernel size stride and padding so they so the dc gan follows sort of a very nice structure in that they have a complainer uh batch norm and leak corellu so what we're going to do is we're going to utilize that block so we're going to do nn sequential and then we're going to do a com2d i'm going to take some in channels we're going to take some in channels and we're also going to take some out channels we're going to send in some kernel size for the com layer some stride some padding and then because we're also going to use batch norm we can set bias equals false it's just an unnecessary parameter so then we can use batch from 2d of out channels and then after that we're going to use the leaky relu and we're going to do with a slope of 0.2 sort of following the implementation in the paper all right so now we're going to do self.discriminator is nn sequential and in the beginning i'm not sure if i actually show this from the paper but they said that they didn't use a batch norm in the early in the first layer of the discriminator and also they didn't use it in the last layer of the generator so sort of just following that all we're going to do is a complaint right here with some channels image as input channels uh then features d we're going to have a kernel size of four stride of two and padding of one so this kernel size strident padding is sort of inferred from the uh the figure that we saw previously then we're going to do a leaky relu and we're going to have a slope of 0.2 so we're skipping the batch norm here but after the initial layer uh this is when we can use that block and so um we're gonna send in some features d uh we're gonna have some feature t times two and then we're gonna have kernel two stride two and padding one so they're sort of utilizing that every time so we're just gonna copy paste that and all we're going to do is we're going to change the in channels to features times 2 and then times 4 and then times 4 and times 8. so maybe i should just sort of go through the shapes a little bit so the input shapes is going to be n times channels image times 64 times 64. and then so after this one it's going to be 32 times 32 and then obviously you know features des channels so after this block then it's going to be 16 by 16 then 8 by 8 and then 4 by four and then at the end when it's four by four uh they do another comp layer so com2d of features d times eight and then output it to just a single channel because remember all they want here is a single value representing if it's it's a if it's a fake image or a real image for that they use a kernel size of four a stride of two and a padding of zero so the output here is going to be just one by one and we're also going to have so it takes this four by four sort of pixel values and converts it into a one by one and the it also converts the channels to a single channel so all that is outputted is then a single value that represents if the image is fake or real and to ensure that it's between zero and one we're gonna send it through a sigmoid layer at the end all right so that's pretty much it uh we got to do one more thing which is uh define define forward of self and x and all we do is then return self.discriminator on x so that's sort of the entire discriminator um and we're going to do a very similar thing for the generator all right so class generator inherit from nn module we're going to define our init so define init of uh and then we're going to send in set dimension the noise dimension and then channels of the image and then features g similar to what we did in the discriminator then we got to call this super method so super of generator self uh init then we're gonna do let's see self.net is nn sequential and we're gonna do i guess a very similar thing here so let's do the block first so block self it's going to take in some in channels it's going to take out send out some mouth channels kernel size stride and padding all right then we're going to return and then a sequential of um and instead of using a comp layer we need to up upscale it um so then for that we're going to use and then com transpose 2d and i'm not going to go into more depth on exactly how a transpose convolution works um you can just sort of view it that it does the i guess sort of the opposite of a comp layer which so for the com transpose 2d we're going to send in some in channels some out channels uh some kernel size stride padding and then we're going to set bias equals false uh just because we're going to use a batch norm and when we use the batch norm we don't need to use the bias so we're going to bat from 2d we're going to send in some out channels and then and then relu and so why we're using relu here is be just copying the dc gan paper so all right now that we have that we can use that in our generator so in the generator we can actually use batch norm in the beginning it was only for the discriminator that we couldn't do that and then they they instead didn't use a bathroom for the last layer so we can do self.block and we can do set dimension as input features g and then times 16 kernel size 4 1 0. so what's going to happen here is that we have some input that's n where n is the batch size and then we have set dimension times 1 times 1 and what happens here is that this makes it into i guess and this makes it into n times you know fg times 16 and then times four by four so really what's important i guess is the image values uh the pixels so it's going to be a four by four after that block and then we're gonna have another block where now the input channels is going to be feature channels times features g times 16 and then we're going to have kernel 2 stride 2 padding 1 and so what's going to happen here it's going to be 8 8 by 8. so if we copy that thing and do it again oh and we need to of course also have right here we're gonna have features g times eight and then uh this right here is gonna be feature g times eight and then features g times four and one thing i actually realize now is that it might not make much sense why we're doing this times eight and times four but all of these are really interpreted from the figure in the paper and so um what we're going to use for all of this to make sense later on is features g and also the features for the discriminator features d are going to be equal to 64. and if we do that and we do all of these um channel we do these computations for each channel uh the numbers just tend to you know they just work out so really this is just looking on the figure and then try trying to interpret um what the what the channels are going to be and they then follow this nice structure that if we use feature g to be 64 then we can half it to be times 16 then times 8 and then times 4. all right so let's do that one more time and we're going to now do features g times 4 features g times 2 and now it's going to be so here it's going to be 16 by 16 16 by 16. here's going to be 32 times 32 and now for the final one we don't want to use um the block because we don't want to have the batch norm so we're going to do com transpose 2d features g times 2 and then we're going to have channels image kernel size 4 stride 2 padding one so very similar to what we did above and this is going to be 64 times 64 output and what we're going to do after that is just in 10 h very similar to what we did in the last video we're using tanh because it's going to make the output to be minus one and one which is exactly what we're going to normalize the the images to be within so since the images are normalized to be within this range we also want to make sure that there are model outputs in a similar range all right so hopefully this works now so what we what we can do is oh actually we need to do one more thing so we need to also create a function to initialize the weights of a particular model and so you know we need to do it as they did in the dc game paper with mean 0 and standard deviation 0.02 so what we can do is we can just for m in model.modules and we can check if um and i just realized that we don't have to do it in this way we can just do and then come to d and then com transpose 2d and then we can do and then batch norm 2d since they're all going to be initialized in the same way we can just then remove those two additional if statements all right and then let's just do a test so test uh we're gonna set n in channels height and width to be let's say eight examples in channels gonna be rgb 3 and then 6464. let's set our our notes uh let's set the set dimension to be 100 and let's generate a torch random of n in channels height and the width then we're going to initialize our discriminator so discriminator of in channels and then 8 for the features g then we're just going to initialize weights of discriminator and then assert uh disk x shape and we want that shape to just be n comma one comma one comma one right because we just want one value per uh example all right so let's just try and run that first of all and make sure that that works and it did so what we can do then is our generator so generator of set dimension and then in channels and then features uh g to be eight then let's uh generate some um said so just a latent noise it's going to be n uh said dimension one one and then we can assert that the generator of that when we input that noise that shape is going to be n comma in channels and height and width so if we run that and it doesn't produce any error let's just print success oh okay so for the generator we forgot to do the forward so i'm forward self and x and then um let's actually let's not call it net so let's call it gen all right so then all we have to do is just return self.gen of x so let's run that again and let's also try to do initialize weights of that generator and it works all right so that's it for the model architecture and hopefully you were able to follow the only i guess tricky part is sort of understanding why this works and for that you might have to compare with the paper and then make sure that you know go through it step by step and see that okay those channels match and then um you know go through it line by all right so here we're importing all we're going to need for the training setup and all of these are pretty basic um and we're also importing you know from model reporting the discriminator generator and also the initialize weights function so uh i'm actually going to code all of this from from scratch i'm not going to copy paste any code but every a lot of the things that we're going to do now is uh very similar to what what we did in the last video when we built our very you know a simple gam but so it might be good for repetition nonetheless so we're going to do device is torch.device so we're going to set up hyperparameters etc we're going to set cuda if torch dot cuda is available otherwise we're going to set the cpu then we're going to do the learning rate is 2 minus 4 you know just copy paste it from the paper the batch size is going to be 128 just copy paste it from the paper the image size is going to be 64. the channels of the image is uh let's set one so we're going to do first for the mnist data set so we're going to do that and just to you know see that it produces something that looks good and hopefully something that's much better than what we got in the last video and then we're also going to train this on a celebrity data set but i'm going to show you um how we add that later it's just we're just going to modify two lines um at the end then we're gonna do our set dimension it's gonna be a hundred uh the number of epochs let's just run for five epochs uh the features this is gonna be 64 and the features gen is going to be 64. and we need to set 64 for both to match the exact same um for what they did in the paper all right so after that we're going to set up some transforms so transforms dot compose uh we're going to do transforms.resize to image size then transforms to tensor and then dot normalize i'm going to do a 0.5 0.5 for my experiments this works a lot better than doing the exact mean and standard deviation so we're just going to use that actually i'm going to use i'm going to do it like this in a list instead and i'm going to do it um for you know underscore in range of channels image so i'm gonna do the same thing for this one i'll explain why i'm doing this in just a second so 0.5 for underscore in range of channels uh image so basically you know why i'm doing this is just so that when we change this to two three when we're doing for the other data set then uh we don't have to modify this normalize it's just going to be in sort of general uh no matter what the channel's image that we set all right then we're gonna you know set the data set and the data set is just gonna be mnist uh root is gonna be data set train is true transform equals transforms and then download equals true the data loader is then going to be um data or we can just call it loader it's going to be data loader of data set batch size is going to be batch size shuffle equals true then we're going to do the generator so i'm going through this a little bit quickly i guess this is sort of what we did in the last video and none of this you know this is just some simple code so we're going to initialize the generator now so generator of noise dimension of channels image and then features g features gen i guess is what we call it and then send that to cuda and the same thing for the discriminator so discriminator here we're going to send in the channels image and then the features discriminator and then to device so you know the features generator and the features discriminator are going to be equal so you could just use one but you know just to make sure that you separate them and you could use different channels then we're going to call initialize weights on the generator and initialize weights on a discriminator all right so then we're also going to set the optimizer to be optim.adam of generator.parameters we're going to set learning rate to be learning rate the beta values and this is important this is from you know the paper we're going to set 0.5 and then the standard is 0.999 so we're just going to keep that one all they wanted to change was the beta 1 value and opt discriminator is optim.adam discriminator that parameters learning rate is learning rate betas is 0.5 0.999 okay then we're going to do the criterion our loss function is going to be nn.bce loss okay so then we're going to do um you know pretty similar to what we did in the last video we're gonna set some fixed noise so we can see the progression as the as it trains um and we're gonna do torch.random 32 uh said dimension oh yeah so this when i sent into noise we're gonna do set dimension here because that's what we called it um up here all right so set dimension and then one one um and send that to device we're gonna have a writer for tensorboard so summary writer um we're just gonna set it to logs and then real and let's copy that and make another one for the the fake ones so this is going to be logs fake and we're also going to need a step for printing to tensorboard and then set both of those networks to training mode they should be by default but for some reason if they aren't and now for setting up the training we're going to do for epoch in range of num epochs and this should be capitalized on epochs then we're going to go through for batch and then reel and we're not going to need the actual labels so i'm just doing an underscore here because training gas is uh unsupervised so then we're going to enumerate of a loader and we're going to need to send real to the to device to make sure that it's on the gpu uh and generate some noise by doing torch random of of let's see patch size noise set dimension and then 1 1 and then dot 2 device and then we're going to train the discriminator and this is to maximize log of d of x and then plus log one minus d of g of z and so if this is the first video that you're watching then i'm i'm explaining this in much more detail in the previous two videos so i'm going to go through this much more quickly so if you're confused over this the training part specifically when we train discriminate right here and then they generate and watch those videos so discriminator on the reel is going to be discriminator unreal then we're going to just reshape that to just be you know a single value for each example so that we don't have you know n times one times one times one so that we only get n we're going to get the loss on that on that discriminator of the real so that's criterion of discriminator reel torch ones like discriminator reel and then we gotta do in the fake one discriminator of fake and then dot reshape minus one loss discriminator on the fake ones it's going to be criterion of fake and then torch zeros like disk fake and this zero uh this dot zero grad loss disk loss disk but we gotta get lost disk is um lost this real plus loss disk fake and then just just divide that by two you don't have to do this but i don't know just kind of makes sense and then loss disk is dot backward and opt disk dot step all right so that's for training the discriminator then we're going to train the generator to minimize log of 1 minus d of g of z but as we talked about in the last video because of gradient saturating gradient problems this is the same thing as maximizing log of d of g of z so we're gonna do output is discriminator of fake oh by the way one thing we need to do here is retain graph equals true and the reason why we got to do written graphic equals true is because we're going to reutilize this fake uh in in the second part but pytorch is going to remove the intermediate results because we did lost our backward here and you know fake was used in this computation so that's why we're doing retain graph so that we can reuse that in the when we train the generator so then uh let's do reshape minus one let's calculate the loss on the generator which is criterion of output torch ones like output and then generator dot zero grad lost gen dot backward and then optigen dot step okay then i'm going to copy i said i wouldn't copy paste code but i'm actually going to copy paste this code because this isn't doesn't say much so i guess i can explain it so occasionally you know we want to print the loss um i'm not really sure why we want to do that to be honest because you know the loss doesn't say anything when it comes to gans so i guess we can just remove that or the printing of the loss but anyways what we're doing here is uh we're just doing with words.nograd generating some fake images on the fixed noise and then we're just using torch vision utils to make a grid of images and then we're writing that to the tensorboard so that's pretty much all we're doing and then we gotta update the step variable because that's what's used to um to sort of see a progression of the images but that should be it for the training and the model so let's run this and let's hope that it works all right didn't work so i'll see avail available so this should be available all right fake is not defined so let's see fake yeah so we're using fake right here but we need to of course generate those so we need to call the generator on that noise so now we have the fake images oh uh and i made a mistake here uh we're gonna send you know take the loss of the discriminator fake and then you know towards zeros like discriminator fake because we're training uh this part right here so we want to take the discriminator on the gener the generated fakes [Music] all right so this is the result we get after running for about five epochs and i would actually have like to run it for a little bit longer because some of these don't look that good some of them look pretty good and then some of them can definitely be improved but nonetheless they look incredibly much better than the ones we we got when we use a fully connected network so maybe that's something you can try out just running it for longer and can we get something that looks better than this but what i want to do now is um all right so what what i want to do now is uh change the data set to this celeb a data set and so basically what we have is just a bunch of images on celebrities i guess and then what i have is that we are going to have a folder select data set and inside that we're going to have just the another folder with images and so why we're doing it that way is because then we can use the data sets image folder and that's going to just automatically load it for us so what we can do is uh just i guess comment out that part then do another data set equals data sets dot image folder of root uh celeb data set and then transform equals transforms and that's all we need to change and also the channels image is going to be 3 now but other than that everything stays the same just one thing is that if if you want to also run on this data set i had some difficulty downloading it so i'm just going to upload it to kaggle and i'm going to have a link in the description of the video and you can go download it and you can just follow the steps that i'm doing in this video alright so let's just run it and train it on the celeb a data set [Music] [Applause] [Music] all right so i stopped the training now because i really want to finish this video today but i let it run for about three epochs and uh so this is the results that we get obviously far from perfect um and we don't really expect it to be that great but i think that if we train it for just some more epochs then it would look even better uh but really just you know this is a hard task that we're asking it to do and hopefully you know we're gonna see you know with more advanced architectures in future videos and to really get you know better performance all right so i really hope that you were able to follow the steps of the video and understand dc gan and its implementation so dc gan works a lot better than the network that we implemented in the previous video but it still suffers from being you know really really sensitive to hyper parameters and i really encourage you to try training these again and try to play around with the hyper parameters and you'll you will really uh see this fact so in the upcoming video we will try to focus on how we can improve the stability of gans and then in future videos we will try to implement more close to state-of-the-art architectures but anyways thank you so much for watching this video i hope to see you in the next one [Music]
Info
Channel: Aladdin Persson
Views: 17,338
Rating: undefined out of 5
Keywords: PyTorch DCGAN Tutorial, dcgan pytorch, Gans with CNNs, DCGAN paper implementation
Id: IZtv9s_Wx9I
Channel Id: undefined
Length: 35min 37sec (2137 seconds)
Published: Mon Nov 02 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.