Pytorch Neural Network example

Video Statistics and Information

Video

Captions Word Cloud

Captions

[Music] in this video I want to walk through how to create a simple fully connected neural network in pi torch there's a lot of information online and I know that it can be overwhelming so let's go through this step-by-step the outline for this video is that we're going to import all the necessary packages from PI torch we're going to create our fully connect neural network very simple one and then we're going to do some initialization of our model like learning rate etc we're going to load data in this example we're going to look at a very simple data set the emne Stata set and we're going to initialize the network that we created in this step and also setup the loss function and the optimizer that we're going to use then we're going to train our network and in the end we want to see well how good is the network that we trained on the training and the test set so to begin with we want to import all the necessary this is your packages so I'm going to copy in and let's go through what each of them does torch is just the entire library and then from torched @nn we're going to import it as and then essentially this is like all the inner Network modules if we're going to define our feet warner network like we're going to use their linear like n n dot linear that's included in the NN module if we're going to do something more complex is advanced like convolutional neural networks it's also going to be in this module loss functions are also inside here and towards that optimize Upton this is all the optimization algorithms like stochastic gradient descent atom etc then torch and unfunctional as f essentially this is like all the functions that don't have any parameters so activation functions like relu 10h etc are in this one they were also included in the NN package so a little confusing you could use either we're going to use this one in this example then we have tortured utils that data import data loader essentially the data loader gives us easier data set management it helps us create mini batches to Train on etc then torch vision that data sets as data sets so pipe torch has a lot of standard data sets and makes it very easy for us to import them in a nice way we're going to use this one to import our am-ness data set then lastly torch vision transforms as transforms is like we're not going to go into this in depth but essentially this has transformations that we can perform on our data set all right so now that we have the packages and we understand like roughly what they do let's create our fully connected Network we're going to create a class and then foreigner Network then we're going to inherit from n n dot module and then we're going to create our initialization and the first thing that we're going to do is call super and then dot init search and essentially the super calls the initialization method of the parent class which is in this case and then dot module so we run the initialization of that that method then we want to create a very very small Network so we're going to do just two layers the first one is going to do an end off linear and yeah so let's input some input size and then some number of classes in our case the input size will be 784 since the amnesty the set has 28 by 28 images which is 784 nodes then yeah so we're going to do an end linear for the input size and then let's do some hidden layer nodes of 50 nodes so very very small and then another one to have fifty two the number of classes next step is to define the forward method so the forward will run on some some input X and then what we're going to do is that we're going to perform the layers that we created or initialized in the init method so we're going to call FFC self dot F c1 on X and then we want to call rel loop on that a non linearity function and let's call this X just for simplicity and then we're going to call self dot fc2 on X and also call this X and then return X okay one idea is that we check if it if it runs and gives the correct shapes on some random generated data so let's say that we initialize our model to B and then and 784 comma 1010 for the number of digits and then we just initialize some X which is going to be shape 64 comma 784 so essentially this this is the number of examples that we are going to run simultaneously so the the number of like images are yes so this is our mini batch size and then we all we do is that we run the model on that input and then just print the shape so what we want this to return is that we want to have 64 by 10 so we're going to want to have 10 like values for each of the digit like four so what is the probability that it's a like a zero it's a 1 et cetera and we want to have that for each image so we see that return 64 by 10 so that we have like a rough idea that ok this does what it's supposed to do now what we want to do is we want to initialize some device so we can either run a model on the GPU or the CPU so what we're going to do is device is stored touch device CUDA if is available so if we have available if we have CUDA then we want to run it on the GPU if we don't then we just run it on the CPU and the hyper parameters in this case we're going to have an input size of 784 with input size 704 number of classes is going to be 10 the learning rate is going to be 0.001 batch size and then the number of epochs that we want to train so let's say just 1 in this case now what we want to do is that we want to load data and as I said in the beginning we can use data sets from torture vision so we can use use data sets that Amnesty and then what we want to do first is that we want to have some route like where it should save the data set let's just create a folder called data set and then train equals true this is going to be the training set so train data set and then transform is transforms to tensor that's all we're going to use the transforms for in this case so I think when it loads the data it's going to be numpy arrays or something like that and we just wanted to transform it to a tensor so that it can run in pi torch and then if we don't have it we want to download it I torch if it if the data set is in the folder data set then it's not going to download it but if it doesn't then we want to download it and then we want to create a training loader so we're going to use the data loader in this case and we're going to use the data set train data set and batch size we're going to set to batch size which is 64 in this case and shuffle equals true is just going to for each when it has gone through each image in the training set and it trains it for a second epoch and it's just going to shuffle the the batches make sure that we don't have the exact same images in a batch every epoch then we want to do the same thing for let's see for data set so test data set test data set test train equals false now we want to so yeah initialize the network our model is going to be n n of input size yeah okay input size equals input size common number of classes equals number of classes and then we're going to do this dot 2 and device so this is what we talked about before CUDA if it's possible otherwise CPU and then for our loss function we're going to use n n dot cross-entropy loss and optimizer is Adam that we're going to use opt in that Adam and for first we want to call all the parameters that we have so model dot parameters and we're going to set the learning rate to learning rate though we initialize in the hyper parameters and yeah so next step we've imported the data or we loaded theta we've initialized the network we've created the lausanne optimizer now we want to train the network so what we're going to do here is first we wanted to run for epoch in range of number of epochs so an epoch is essentially how like one epoch means that the network has seen all the images in the data set so then we want to go through each batch that we have in our training loader so for data comma targets in train loader one thing that is common is that we want to see which batch index that we have so how we do that is that we take enumerate of train loader and then here we set batch index comma and then in a tuple set data comma targets so the data essentially the images the targets is the correct correct digit for each label for each image then what we want to do is that we want to make the data that is currently a tensor to the device that we're using so data equals data to device equals device targets equals targets dot to device equals device now one thing that we want to do is that if we I guess if you run this and check the shape okay it's going to download first okay so one thing we see here well okay let's restart this okay okay so what we see here is that the I mean like a number of examples the number of images that we have is 64 which is correct and then we have one for the number of channels essentially this is like Amnesty is black and white so it's only going to have one input channel if we would have colored images this would be RGB so this would be three in that case and this is just the height and the width of each of the image so Eminence is 28 by 28 but we want this to be just one single 784 so we want to unroll this matrix into a long vector one way to do this is to do data equals data reshape and then set the first access to be like the first dimension to to just remain 64 and then just do a minus 1 then it's going to essentially yeah flatten all of these ones into a single dimension then what we want to do so this is get data to CUDA if possible yeah and this is just like reshape get into correct shape then we want to do the forward part of the neural network and then this is quite easy we just call the model of the data let's call that scores then we want to call the lot like the loss function which we define as criterion criterion of scores so this is yeah let's see so criterion of yeah okay so criterion calls the input first so this course and then the correct yeah so the input which is scores and the target so the ones that we have targets here those are the correct labels then we want to go backward in them in the network as I said before we don't have to actually implement like the backward propagation so this is quite easy in in pi torch we're just going to call loss top backward one thing we need to remember though is is that we need to do optimizer dot 0 grad essentially we want to set all the gradients to 0 for each through each batch so that it doesn't store the back like the back prop calculations from previous previous forward problems and then in the end we want to do a gradient descent or Adam step and this is quite easy optimizer dot step so here we update the weights depending on the the gradients computer in lost out backward after doing that we yeah so that's the training now what we want to do the last thing that we want to do is check like the accuracy let's see if we if it performs any good so we created the define check accuracy and let's let's input the model the loader and the model then we want to do is we want to first set the like how many number of correct to be 0 and the number of samples to be 0 in the beginning and then we want to do model dot evaluate so we want to set them all to evaluate because well if we would use some other techniques then we want to let the model know that this is evaluation mode which might impact how the calculations are done then we want to do we torch dot no grad so this when we just check the accuracy we don't have to actually compute the gradients that would be unnecessary computation so we just do torch that no grad to that PI torch know that you don't have to actually compute any gradients in the calculations so for x and y in loader then we do first so x equals x dot two device equals device y equals y dot two device equals device and then X dot equals reshape it's essentially like the same thing that we did here write x dot shaped 0 comma minus 1 and then call scores model of X and now we want to do is that we want to take scores dot max and we want to have the max of the second dimension so we can either set 0 or 1 here if we would look at the shapes of scores this would be as we saw previously it will be 64 images common times 10 and we want to know which one is the maximum of those total 10 digits so is it if like if the maximum value would be at the first one then it will be a digit zero so do predictions to be scores max of 1 so this would give us the value we're not really interested in the value interested in the index of the maximum value for the second dimension then we just do a number correct to be the predictions that are equal to the correct label and then we just take the sum and number of samples will be the predictions so predictions dot size of the of the first dimension so essentially this right 64 and then we want to compute the accuracy to be yeah so I guess we could just do print and then let's do an F string got number correct number of samples with accuracy so just want to print it in a nice way so what this does is we convert remember these are tensors so we want to convert them into floats and then we don't want to print for example let's say we have ninety five point like this we would rather just print with two decimals like this so what we can do with f strings it's just like this comma to F and that should be that should be all I think then we we just return the model the Train so after we've checked accuracy return the model to training I guess it's not necessary for this example because we are done with the training part but perhaps you would check that we receive all your training to see that it actually improves then this would be something you want to have then in the end let's see we want to run check accuracy on the train loader a train loader comma model and then similarly on the test set one thing we can do also is we can check in the beginning if the loader dot dataset train then we know that it's running on the training data so we can do print check checking accuracy on training data and if it's not then we can say print checking accuracy on test data like that and I think this should yeah this should be all so let's see if this runs and case I'm running on the CUDA because I have a GPU if you don't think it's going to run a DCP it's not really going to be any difference so let's see we got an error return accuracy yeah so we think to find accuracy I guess we don't actually have to return it okay but we see a herd that sort of around checking accuracy on training data ninety-three percent all right so training just one epoch on this small network actually gives quite okay a crazies yeah well depends on what like 93 percent is okay I guess and in this case it actually performs better on the test data I think if we run this form or epochs it's going to perform even better and also if we make the elective network larger it's definitely going to improve but this was that the idea was not to get the best accuracy it's just to show an example of how we create a simple neural network in pi torch yep so hope you enjoyed this video and I hope to see you in the next one

Info

Channel: Aladdin Persson

Views: 31,473

Rating: 4.9674797 out of 5

Keywords: pytorch neural network tutorial, Pytorch neural network, Pytorch feed forward network, pytorch simple neural network example

Id: Jy4wM2X21u0

Channel Id: undefined

Length: 23min 32sec (1412 seconds)

Published: Sat Apr 04 2020