PyTorch Tutorial 14 - Convolutional Neural Network (CNN)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everybody welcome to a new PI torch tutorial today we are implementing a convolutional neural network and do image classification based on the SyFy 10 dataset the cipher 10 is a very popular image data set with 10 different classes like we have airplanes cars birds cats and other classes and this data set is available directly in PI tersh so you will create a convolutional neural net that can classify these images so now let's talk about convolutional neural networks very briefly I will not go into too much detail now because this tutorial should be focused on the PI torch implementation but I will provide further links in the description if you want to learn more in detail so convolutional neural nets or confidence are similar to ordinary neural networks they are made up of neurons that have learn about Sande biases and the main difference now is that convolutional nets mainly work on image data and apply the so called convolutional filters so a typical confident architecture looks like this so we have our image and then we have different convolutional layers and optional activation functions followed by so-called pooling layers and these layers are used to automatically learn some features from the images and then at the end we have a one or more fully connected layers for the actual classification tasks so yeah this is a typical architecture of a C and n and these convolutional filters now they work by applying a filter kernel to our image so we put the filter at the first position position in our image so this is the filter here and this is the input image so we put it at the first position the rat position and then we compute the output value by multiplying and summing up all the values and then we write the value into the output image so here at the red and then we slide our filter to the next position so the green position then if you can see this here and then we do the same thing and the same filter operation and then we slide our filter over the whole image until we are done so this is how convolutional filters work and now with this transform our resulting image may have a smaller size because our filter does not fit in the corners here except if we use a technique that is called padding but we will not cover this here in this lecture so getting the correct size is an important step that we will see later in practice and now let's also talk about pooling layers briefly so pooling layers are more specific in this case the max pooling max pooling is used to down sample an image by applying a maximum filter to separations so here we have a filter of size two by two and then we look at the two by two sub regions in our original image and we write the maximum value of this region into the output image so max pooling is used to reduce the computational costs by reducing the size of the image so this reduces the number of parameters that our model has to learn and it also helps to avoid overfitting by providing an abstracted form of the input so yeah these are all the concepts we must know and again please check out the provided links if you want to learn more and now enough of the theory and let's get to the code so here I already wrote the most things that we need so we import the things that we need then we make sure that we also have the GPU support then we define the hyper parameters and if you don't know how I structure my pie chart files then please also watch the previous tutorials because there I already explained all of these steps so then first of all we load the data set and here as I said the cypher 10 data set is already available in pie charts so we can use it for from the pie chart data sets module then we define our pie chart data sets and the pie chart data loader so then we can do automatically batch optimization and batch training then I defined the classes and hard-coded them here and then here now we have to implement the convolutional net and then as always we typically we create our modeled and we create our loss and the optimizer so in this case as this is a multi-class classification problem we use the cross entropy loss and then as optimizer we used a stochastic gradient descent which has to optimize the model parameters and it gets the defined learning rate and then we have the typical training loop which does the batch optimization so we loop over the number of epochs and then we loop over the training loader so we get all the different batches and then here again we have to push the images and the labels to the device to get the GPU support then we to do our typical forward pass and create the loss and then we do the backward pass where we must not forget to call to empty the gradients first you with the zero crap then we call the backward function and optimize a step and then print some information then when we are done we evaluate the model and as always we wrap this in a with torch dot no gret argument or statement so because we don't need the the backward propagation here in the gradient calculations and then we calculate the accuracy so we calculate the accuracy of the total network and we are lady accuracy for each single class so yeah so this is the script you can also find this on my github so please check that out there and now the only thing that is missing now is to implement the convolutional net so for this we define a class confident which must inherit an N dot module and as always we have to define or implement the init function and the forward function for the forward pass so now let's write some code here so for this we have a look at the architecture again so here first we have a convolutional layer and then followed by a real activation function then we apply a max pooling then we have a second convolutional layer with a real function and a max pooling and then we have three different fully connected layers and then at the very end we have the softmax and the cross entropy so the softmax is already included in the cross entropy loss here so we don't need to care about this so yeah so let's set up or create all these layers so let's say self conf one equals and here we get the first convolutional layer by we get this by saying n n dot conf 2d and now we have to specify the sizes so the input channel size now is three because our images have three color channels so that's why the input channel size is a 3 and then let's say the output channel size is 6 and the kernel size is 5 so 5 times 5 and now let's define a pooling layer self pool equals n n dot max pool 2d with a kernel size of 2 and a stright of 2 so this is in as in the image that we have seen so our kernel size a size two by two and after each operation we shifted to pixels to the right so that's why the stride is two and then let's define the second convolutional layer so self-conscious and now the input channel size must be equal to the last output channel size so here we say six and as output let's say 16 and kernel size is still 5 and so now we have our convolutional layers and now let's set up the fully connected layer by saying self dot FC 1 equals n n dot linear and now here as an input size so first I will write this for you so this is 16 times 5 times 5 and as output size I will simply say I will say 100 so you can try out a different one here and I will explain in a second why this is 16 times 5 times 5 then let's set up the next fully collected layer so this has 120 input features and let's say 84 output features and then let's use a next of final fully connected layer so we have FC 1 FC 2 and FC 3 and this is an input size of 84 and the output size must be 10 because we have 10 different classes so you can change the 120 here and also the 84 but this must be fixed and also the 10 must be fixed so now let's have a look at why this is this must be this number so here I have a little script that does exactly the same thing so now let me change the number of epochs oh yeah is for so here I have the same thing in the beginning I load the data set and let's also print or plot some images and then here I have the same layers so here I have the first convolutional layer and the pooling layer and the second convolutional layer and first of all let's run this and plot the images so let's say Python C and n test dot pi and I've already downloaded it's a prince yeah it's very blur but I think you can see this this is a horse and maybe a bird and another horse and yeah I don't recognize this actually let's run run this again see some better pictures maybe so you got still very blurred so I think this is a deer a car a frock and a ship so yeah so let's see how the sizes looks so first we just print images touch shapes so this is 4 by 3 by 32 by 32 and this is because our batch size is 4 and then we have three different color channels and then our images have size 32 by 32 so now let's apply the first convolutional layer so we say x equals quant 1 and this will get the images and now let's print the next size after this operation so let's don't oh sorry I don't want to plot this anymore so now we have the next size so this is 4 by 6 by 28 by 28 and so the 6 now we have 6 output Chen as we specified here and then the image size is 28 by 28 because as I said the resulting image may be smaller because our filter doesn't fit in the corners here and the formula to calculate the output size is this so this is the input width minus the filter size plus 2 times padding's in this case we don't have padding and then divide it by the stright and then plus 1 so in this example we have an input size 5x5 a filter size 3x3 padding is zero and stride is 1 so then we have the output size is 5 minus 3 plus 1 so this is 2 then divided by 1 still 2 and then plus 1 so that's why here our output image is 3 by 3 and now we have to apply the same formula in our case so we have 32 minus the filter size so minus 5 so this is 27 plus 0 still 27 divided by 1 still 27 and then plus 1 so that's why it's 28 so here we have 28 by 28 then let's apply the next layer so the next operation is the pooling layer so let's save this and run this so now our size is 4 by 6 by 14 by 14 so this is because as in the example our pooling layer with a kernel size 2 by 2 and a stride of 2 will reduce the images by a factor of 2 so yeah and now let's apply the second convolutional layer so let's print the size after this operation so clear this first and run this and then again we would have to apply the formula as I just showed you to reduce the size so here PI torch can figure this out for us so the size is 4 by 16 and this is because the next channel output size and that we specified is 16 and then the resulting image is 10 by 10 and then we apply another pooling operation that will again reduce the size by a factor of two so this is why now we see that the final size after both convolutional layers and the pooling layers is 4 by 16 by 5 by 5 so and now if we have a look again so now after these convolutional layers now when we put them into our classification layers we want to flatten the size so we want to flatten our 3d 10 0 to a 1 D 10 ZOA and now this is why now if we have a look at the size now the input size of the first linear layer is exactly this that we have here so 16 times 5 times 5 so this is very important to get the correct size here but now we know why this is so this must be 16 times 5 times 5 and now we have the correct sizes so now we have all the layers defined and now we have to apply them in the forward pass so we say x equals and now let's apply the first convolutional layer which gets x and then after that we apply an activation function so we can do this by calling F so I imported torch and and functional as F and then I can call F dot riilu and then put in this as the argument and then after the activation function so by the way the activation function does not change the size so now we apply the first pooling layers of self to pool and rep this here and so this is the first convolutional and pooling layer and then we do the same thing with the second convolutional layer and now we have to pass it to the first fully connected layer and for this we have to flatten it so we can do this by saying x equals x dot u and the first size we can simply say minus one so pi touch then can automatically define the correct size for us so this is the number of patches the number of samples we have in our patch here so for in this case and then here we must say sixteen times five times five and now we have our tens of flattens and now let's call the first fully connected layer by saying x equals self dot FC one and this will get X and then we apply an activation function again we simply use the riilu I also have a whole tutorial about activation functions so please check that out if you haven't already so now after this we apply the second one so x equals this the second fully connected layer with a real uu activation function and at the very end we simply have x equals self dot the last fully connected layer fc3 with x and no activation function at the end and also no softmax activation function here because this is already included in our loss that we set up here so then we can simply return X and this is the whole conlou net model now you should know how we can set up this and yeah so then we create our model here and then we continue with the training loop that I already showed you so now let's save this and let's run this so clear this and say Python C and n dot PI and hope that this will start the training so oh yeah one thing I forgot of course is to call the super init so never forget to call super and this has to get the confident and self and then dot underscore in it so let's clear this again and try this one more time and now this should start the training so I don't have GPU support on my macbook so this can take a few minutes so I think I will skip this and continue when the training is done so see you in a second alright so now we are back our training has finished and if we have a look we can see that the loss slowly decreased and then we have the final evaluation so the accuracy of the total network is 46.6% and the accuracy of each class is listed here so it's not very good and this is because we only specified for epochs here so you might want to try out and more epochs but yeah you now you should know how a convolutional neural net can be implemented and I hope you enjoyed this tutorial if you enjoyed this please leave a like and subscribe to the channel and see you next time bye
Info
Channel: Python Engineer
Views: 56,569
Rating: undefined out of 5
Keywords: Python, Machine Learning, ML, PyTorch, Deep Learning, DL, Python DL Tutorial, PyTorch Tutorial, CNN, Convolutional Neural Network, PyTorch Course
Id: pDdP0TFzsoQ
Channel Id: undefined
Length: 22min 6sec (1326 seconds)
Published: Fri Feb 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.