Deep Learning With PyTorch - Full Course

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome guys to this all-in-one pie torch video this video takes all parts from my beginner pie torch playlist and combines it into one single video the course goes from zero to intermediate level and teaches you all the fundamentals you have to know to be confident with this deep learning framework i will leave timestamps for each section in the description and all code is available on github now before we start i'd like to thank our sponsor of this course tab 9. tab 9 is an ai powered auto completion tool that integrates with your ide and helps you to code faster it supports all modern programming languages and detects which language you are working in in fact i've been using this tool myself for a while now and i have to say i'm really impressed with this functionality it's one of my favorite plugins for vs code now and it also integrates with other ids it uses deep learning under the hood and was trained on code from millions of repositories on github and with this knowledge the tool can make predictions for code completions and suggestions that help you to code faster reduce mistakes and even discover best coding practices and i have to admit this works really well and it helps me in my workflow and one nice and important thing to mention is that your code is totally safe so you have complete privacy because tab 9's local completion model runs on your machine without sending any of your code anywhere and the best part is it's free their basic plan is free forever with the option to upgrade if you want a more advanced model so i encourage you to download it test it for yourself and see if you like it i will leave you a link in the description and with that let's get started and if you enjoy the content be sure to like and subscribe hi everybody welcome to a new tutorial series in this series we are going to learn how to work with pie torch pie torch is one of the most popular machine learning and deep learning frameworks it's really fun to work with it and develop cool applications so i hope you watch the series and learn all about the necessary basics for this framework so in this first video i show you how we install pytorch so let's start and for this we go to the official website pytorch.org then click on get started then select the newest pi touch build so right now this is version 1.3 then select your operating systems in my case it's a mac then select the package manager with which you want to install pytorch so i highly recommend to use anaconda and if you haven't installed anaconda yet and don't know how to use it then please watch my other tutorial about anaconda so i will put the link in the description below and then select the newest python version so here i select python 3.7 and unfortunately on the mac you can only install the cpu version right now but if you are on linux or windows and want to have gpu support um then you can also install or have to install the cuda toolkit first so the cuda toolkit is a development environment for creating high performance gpu accelerated applications for this you need an nvidia gpu in your machine and if you have that then you can go to the website developer.nvidia.com slash cuda minus downloads and then we have to be careful because right now the newest supported cuda version by pytorch is cuda 10.1 so we have to get this version so right now the newest version is 10.2 so we have to go to legacy releases then select the newest cuda toolkit 10.1 then select your operating system so for example windows windows 10 then download the installer and follow the instructions and this will also check if your system is suitable for the cuda toolkit so if this is successful then we can go back to the pie charts site and copy this command so in my case on the mac now i need this command so let's copy this and now let's open up a terminal and first of all we want to create a virtual environment with conda in which we want to install all of our packages and install pytorch so let's create a environment let's say conda create minus n and now give it a name so i call this pi torch simply pi torch and then also specify the python version so let's say python equals 3.7 and then hit enter now this will create your virtual environment with python 3.7 let's hit enter again to proceed and this will take a while and now it's done so now we can activate this environment with conda actuate pi torch and now we are inside of this environment and we can see this because here in the beginning we have pytorch in parentheses so this is the name of the environment and now let's paste our installation command from the website so this will install pi torch and all the necessary packages so this will also take a couple of seconds now again let's hit enter to proceed and now it's done so now we have installed pytorch and we can verify that by starting python inside this environment so let's say or type python and enter and now we have python running and now we can import the torch module so if the installation was not correct and right now you would get a module not found error but in this case it is correct and now we can for example create a torch tensor so let's say x equals torch dot rand end of size three and now we want to print our tensor so this also works and now we can also check if cuda is available so we can say tor torch dot cuda dot is underscore available so in my case it says false but if you've installed the cuda toolkit and also the gpu supported pytorch packages then this should say true so yeah so now we have installed in our uh pie chart and can get started working with it so i hope you enjoyed this and see you in the next tutorial bye hi everybody welcome to a new pie torch tutorial in this video we are going to learn how to work with tensors so how we can create tensors and some basic operations that we need we will also learn how to convert from numpy arrays to pi torch tensors and vice versa so let's start so in pytorch everything is based on tensor operations from numpy you probably know arrays and vectors and now in pi torch everything is a tensor so a tensor can have different dimensions so it can be 1d 2d or even 3d or have more dimensions so let's create an empty tensor so first of all we import torch of course and then we say x equals torch dot empty and then we have to give it a size so for example if we just say one then this is like a scalar value so let's print our tensor so this will print an empty tensor so the value is not initialized yet and now we can change the size so for example if we say three here then this is like a one d vector with three elements so now if you run this we see three items in our tensor and now we can also make it 2d so for example let's say the size is two by three so this is like a 2d matrix and i'll run this and of course we can put even more dimensions in it so now it would be 3d and now for example now it would be 40 but now i don't print it anymore because um it's hard to to see the the four dimensions but yeah this is how we can create an empty uh tensor and we can also for example create a tensor with random values by saying torch dot rand and then give it the size so let's say two by two and let's print our tensor again um we can also the same like in numpy we can say torch dot zeros so this will put all um zeros in it or we can say torch dot once so this will put once in all the um items um then we can also give it a specific data type so first of all we can have a look at the data type by saying x dot d type so if we run this then we see by default it's a float 32 but we can also give it the d type parameter and here we can say for example torch dot in so now it's all integers or we can say torch dot double now it is doubles um or we can also say for example float 16 just um yeah and now if you want to have a look at the size we can do this by saying x dot size and this is a function so we have to use parentheses so this will print the size of it and we can also construct a tensor from data so for example from a python list so for example here we can say x equals torch dot 10 sore and then here we put a list with some elements so let's say 2.5 0.1 and then print our tensor so this is also how we can create a tensor and now let's talk about some basic operations that we can do so let's create two tensors with random values of size 2 by 2 so x and y equals torch dot rand 2 by 2. so let's print x and let's print y and yeah so now we can do um simple addition for example by saying set equals x plus y so and now let's print our c so this will do element wise addition so it will add up each of the entries and we could also use set equals torch dot at and then x and y so this would do the same thing um now we could also do an in place addition so for example if we say um y dot and then at underscore x and then print y so this will modify our y and add all of the elements of x to our y and by the way in pi torch every function that has a trailing underscore will do an in place operation so this will modify the variable that it is applied on so yeah so next to addition of course we could also use subtraction so we can say c equals x minus y or this would be the same as c equals torch dot sub x and y now if you print c then we can see the element by subtraction then we can also do a multiplication of each element so this would be torch dot mal and again we can do everything in place by saying y dot mal underscore x so this would modify our y and then we can also do element wise tuition so this would be torch dot diff and yeah so this is some basic operations that we can do with tensors and then we can also do slicing operations like you are used to from numpy arrays so let's say we have a tensor of size let's say five by three and let's print this first and now print x and now for example we can simply or we can get all rows but only one column so let's use slicing so we here use a column for all the rows but only the column zero so let's print the whole tensor and only this so here we see we have only the first column but all the rows or we can just say for example let's use the row number one but all columns so this would print the second row and all the columns um then we can also just get one element so the element at position one one so this would be and this value and by the way right now it prints the tensor and if we have a tensor with only one element we can also say um we can call the dot item method so this will get the actual value but be careful you can only use this if you have only one element in your tensor so this will get the actual value and yeah now let's talk about reshaping a tensor so let's say we have a tensor of size let's say four by 4x4 and print our tensor and now if you want to reshape it then we can do this by saying or by calling the view method so we say y equals x dot view and then give it a size so let's say we only want one dimension now so let's print y um so now it's only a one d vector um and of course the number of elements must still be the same so here we have four by four so in total it's also 16 values and for example if we don't want to put the dimension or the value in one dimension and we can simply say minus one and then specify the other dimension and pi torch will automatically determine the right size for it so now it must be a two by eight um tensor so we can also print the size again to have a look at the size so this is size two by eight so it's correctly determined the size if we put a minus one here so yeah this is how we can resize tensors and now let's talk about converting from numpy to a torch tensor and vice versa so this is very easy so first of all let's import numpy again or import numpy snp and i think i have to oh no it's already installed here so let's create a tensor first so a equals torch dot and let's create a tensor with ones of size five so let's print our tensor and now if we want to have a numpy array we can simply say b equals a dot numpy and then print b so now we have a numpy array so if we print the type of b um and then this will see and this will print that we have a numpy and d array so yeah this is how we can create from a tensor to a numpy array um but now we have to be careful because if the tensor is on the cpu and not the gpu then both objects will share the same memory location so this means that if we change one we will also change the other so for example if we print or if we modify b or a in place by saying a dot at underscore remember all the underscore functions will modify our variable in place and add one so if we add one to each element and now first let's have a look at our a tensor and now let's also have a look at our b numpy array then we see that it also added plus one to each of the elements here because they both point to the same memory location so be careful here and yeah if you do want to do it the other way around so if you have a numpy array in the beginning so let's say a equals numpy um once of size five and then print a and now you want to have a torch tensor from a numpy array then you can say b equals torch and then from underscore numpy and then put the numpy array so now we have a tensor and this will yeah by default this will put in the data type float64 of course you could also specify the data type here if you want a different data type um and now again we have to be careful if we modify one so if we modify for example the numpy array by um incrementing each element so now print our numpy array so we see that it incremented each value and if we print b then we see that our tensor got modified too so again be careful here um yeah but this happens only if your tensor is on the gpu and this is one thing that we haven't talked about yet because you can also do the operations on the gpu but only if this is available so if you have also installed the cuda toolkit and you can check that by saying if torch dot cuda dot is available and so in my case on the mac it will and this will return false but for example if you are on windows and you have cuda available then you can specify your cuda device by saying device equals torch dot device and then say cuda here and then if you want to create a tensor on the gpu you can do this by saying x equals torch dot once and then for example give it the size and then say device equals device so this will create a tensor and put it on the gpu or you can first create it so simply by saying y equals torch dot once of size five and then you move it to your device to your gpu by saying y equals y dot 2 and then device so this will move it to the device and now if you do an operation for example c equals x plus y then this will be performed on the gpu and might be much faster um yeah but now you have to be careful because now if you would call c dot numpy then this would return an error because um numpy can only handle cpu tensors so you cannot convert a gpu tensor back to numpy so then again we would have to move it back to the cpu so we can do this by saying c equals c dot 2 and then as a string cpu so now it would be on the cpu again so yeah this is all the basic um operations that i wanted to show you and one more thing a lot of times when a tensor is created for example torch dot um once of size five then a lot of times you see the argument requires grad equals true so by default this is false and now if we print this then we will also see here in our tensor that it will print requires grad equals true so a lot of times in code you will see this and this will tell pi torch that it will need to calculate the gradients for this tensor later in your optimization steps so whenever this means that whenever you have a variable in your model that you want to optimize then you need the gradients so you need to specify requires grad equals true but yeah we will talk about this more in the next tutorial so i hope you enjoyed this tutorial and if you liked it please subscribe to the channel and see you next time bye hi everybody welcome to a new pie torch tutorial today we learn about the autograd package in pi torch and how we can calculate gradients with it gradients are essential for our model optimization so this is a very important concept that we should understand luckily pi touch provides the autograd package which can do all the computations for us we just have to know how to use it so let's start to see how we can calculate gradients in pi charge so first of all we import torch of course and now let's create a tensor x equals torch dot rand n of size 3 and now let's print our x so this is a tensor with three values so three random values and now let's say later we want to calculate the gradients of some function with respect to x then what we have to do is we must specify the argument requires grad equals true so by default this is false and now if we run this again then we see that also pi touch tracks that it requires the gradient and now whenever we do operations with this tensor pi torch will create a so-called computational graph for us so now let's say we do the operation x plus two and we store this in an output so we say y equals x plus two then this will create the computational graph and this looks like this so for each node we have a for each operation we have a node with inputs and an output so here the operation is the plus so in addition and our inputs are x and 2 and the output is y and now with this graph and the technique that is called back propagation we can then calculate the gradients i will explain the concept of back propagation in detail in the next video but for now it's fine to just know that we or how we can use it first we do a forward pass so here we apply this operation and in the forward pass we calculate the output y and since we specified that it requires the gradient pytorch will then automatically create and store a function for us and this function is then used in the back propagation and to get the gradients so here y has an attribute grad underscore fn so this will point to a gradient function and in this case it's called at backward and with this function we can then calculate the gradients in the so-called backward path so this will calculate the gradient of y with respect to x in this case so now if we print y then we will see exactly this graph fn attribute and here this is an at backward function so because here our operation was a plus and then our um then we do the back propagation later so that's why it's called add backward and let's do some more operation with our tensors so let's say we have c equals y times y times two for example so this tensor then also has this great function attribute so here grad fn equals mul backward because here our operation is a multiplication and for example we can say c equals c dot mean so we can apply a mean operation and then our gradient function is the mean backward and now when we want to calculate the gradients the only thing that we must do is to call c dot backward so this will then calculate the gradient of c with respect to x so x then has a gradient a dot grad attribute where the gradients are stored so we can print this and now if you run this then we see that we have the gradients here in this tensor so this is all we have to do and now let's have a look what happens when we don't specify this argument so first of all if we print our tensors then we see that they don't have this grad function attribute and if we try to call the backward function then this will produce an error so it says tensors does not require grad and does not have the grad function so remember that we must specify this argument and then it will work and one thing that we should also know is so in the background what this basically does this will create a so-called vector jacobian product to get the gradients so this will look like this i will not go into the mathematical details but we should know that we have the jacobian matrix with the partial derivatives and then we multiply this with a gradient vector and then we will get the final the final gradients that we are interested in so this is also called the chain rule and i will also explain this more in detail in the next video but yeah we should know that actually we must multiply it with a vector so in this case since our c is a scalar value we don't have to put the um don't have to use an argument here for our backward function so our c here has only one value so this is fine but let's say we didn't apply the mean operation so now our c has more than one value in it so it's also size one by three and now when we try to call the backward function like this then this will produce an error so great can be implicitly created only for scalar outputs so in this case we have to give it the gradient argument so we have to create a vector of the same size so let's say v equals torch dot tensor and here we put for example point one one point zero and point zero zero one and we give it a data type of torch dot float 32 and then we must pass this vector to our backward function and now it will work again so now if we run this then this is okay so we should know that in the background this is a check a vector jacobian product and a lot of times the last operation is some operation that will create a scalar value so this is it's okay to call it like this without an argument but if this is not an a scalar then we must give it the the vector and yeah then some other thing that we should know is how we can prevent um pi torch from tracking the history and calculating this grad fn attribute so for example sometimes during our training loop when we want to update our weights then this operation should not be part of the gradient computation so in one of the next tutorials i will give a concrete example of how we apply this autocrat package and then it will become clearer maybe but yeah for now we should know how we can prevent this from from tracking the gradients and we have three options for this so the first one is to call the requires grad underscore function and set this to false the second option is to call x dot detach so this will create a new tensor that doesn't require the gradient and the second option would be to wrap this in a width statement so with torch dot no grad and then we can do our operations so yeah let's try each of these so first we can say x dot requires grad underscore and set this to false so whenever a function has a trailing underscore in pi torch then this means that it will modify our variable in place so now if we print x then we will see that it doesn't have this require grad attribute anymore so now this is false so this is the first option and the second option would be to call x detach so we say y equals x dot detach so this will create a new vector with the same or a new tensor with the same values but it doesn't require the gradient so here we see that our y has the same values but doesn't require the gradients and the last option is to wrap it in a torch in a width with statement with torch dot no grad and then we can do some operations for example y equals x plus 2 and now if we print our y then we see that it doesn't have the gradient function attribute here so yeah if we don't use this and would run it like this then our y has the gradient function so these are the three ways how we can stop um pi touch from creating this gradient functions and tracking the history in our computational graph and now one more very important thing that we should also know is that whenever we call the backward function then the gradient for this tensor will be accumulated into the dot grad attribute so their values will be summed up so here we we must be very careful so let's create some dummy training example where we have some have some weights so this is a a tensor with ones in it of size let's say four and they require the gradient so require scrut equals true and now let's say we have a training loop where we say for epoch in range and first let's only do one iteration and here we do let's say model output equals um let's say weights times 3 dot sum so this is just a dummy operation which will simulate some model output and then we want to calculate the gradients so we say model output dot backward and now we have the gradient so we can call weights dot grad and print this um so our gradients here are three so the tensor is filled with threes and now if we do another iteration so if we say we have two iterations then the second backward call will again accumulate the values and write them into the grad attribute so now our grabs has sixes in it and now if we do a third iteration then it has nines in it so all the values are summed up and now our weights or our gradients are clearly incorrect so before we do the next iteration and optimization step we must empty the gradients so we must call weights dot grad dot zero underscore and now if we run this then our gradients are correct again so this is one very important uh thing that we must know during our training steps and later we will work with the pytorch built in optimizers so let's say we have a optimizer from the torch optimization package so torch dot optim dot sgd for stochastic gradient descent which has our weights as parameters and some learning rate and now with this optimizer we can call or do a optimization step and then before we do the next iteration we must call the optimizat optimizer dot zero grad function which will do exactly the same so yeah we will talk about the optimizers in some later tutorials but yeah for now the things you should remember is that whenever we want to calculate the gradients we must specify the require scrat parameter and set this to true then we can simply calculate the gradients with i'm calling the backward function and before we want to do the next operation or the next iteration in our optimization steps we must empty our gradients so we must call the zero function again and we also should know how we can prevent some operations from being tracked in the computational graph and that's all i wanted to show you for now with the autograd package and i hope you liked it please subscribe to the channel and see you next time bye hi everybody welcome to a new pie torch tutorial in this video i'm going to explain the famous backpropagation algorithm and how we can calculate gradients with it i explain the necessary concepts of this technique and then i will walk you through a concrete example with some numbers and at the end we will then see how easy it is to apply back propagation in pi torch so let's start and the first concept we must know is the chain rule so let's say we have two operations or two functions so first we have the input x and then we apply a function a and get an output y and then we use this output as the input for our second function so the second function b and then we get the final output c and now we want to minimize our c so we want to know the derivative of c with respect to our x here in the beginning and we can do this using the so-called chain rule so for this we first compute the derivative of c with respect to y and multiply this with the derivative of of y with respect to x and then we get the final derivative we want so first here we compute the derivative at this position so the derivative of this output with respect to this input and then here the derivative of this output with respect to this input and then we multiply them together and get the final gradient we are interested in so that's the chain rule and now the next concept is the so-called computational graph so for every operation we do with our 10 source pi torch will create a graph for us so where at each node we apply one operation or one function with some inputs and then get an output so here in this case in this example we use a multiplication operation so we multiply x and y and then get c and now at these notes we can calculate so-called local gradients and we can use them later in the chain rule to get the final gradient so here the local gradients we can compute two gradients the gradient of c with respect to x and this is simple since we know this function here so this is the gradient gradient of x times y with respect to x which is y and here in the bottom we compute the derivative of x times y with respect to y which is x so local gradients are easy because we know this function and why do we want them because typically our graph has more operations and at the very end we calculate a loss function that we want to minimize so we have to calculate the gradient of this loss with respect to our parameter x in the beginning and now let's suppose at this position we already know the derivative of the loss with respect to our c and then we can get the final gradient we want so that with the chain rule so the gradient of the loss with respect to x is then the gradient of loss with respect to c times our local gradient so the derivative of c with respect to x and yeah this is how we get the final gradient then and now the whole concept consists of three steps so first we do a forward pass where we apply all the functions and compute the loss then at each node we calculate the local gradients and then we do a so-called backward pass where we compute the gradient of the loss with respect to our weights or parameters using the chain rule so these are the three steps we're gonna do and now we look at a concrete example so here we want to use linear regression and if you don't know how this works then i highly recommend my machine learning from scratch tutorial about linear regression um i will put the link in the description so basically we model our output with a linear combination of some weights and an input so our y hat or y predicted is w times x and then we formulate some loss function so in this case this is the squared error actually it should be the mean squared error but for simplicity we just use the squared error otherwise we would have another operation to get the mean so the loss is the difference of the predicted y minus the actual y and then we square it and now we want to minimize our loss so we want to know the derivative of the loss with respect to our weights and how do we get that so we apply our three steps first we do a forward pass and put in the x and the w and then here we put in the y and apply our functions here and then we get the loss then we calculate the the local gradients at each node so here the gradient of the loss with respect to our s then here the gradient of the s with respect to our y hat and here at this note the gradient of y hat with respect to our w and then we do a backward pass so we start at the end and here we have the first we have the derivative of the loss with respect to our s and then we use them and we also use the chain rule to get the derivative of the loss with respect to of the y hat and then again we use this and the chain rule to get the final gradient of the loss with respect to our w so let's do this with some concrete numbers so let's say we have x and y is given so x is one and y is two in the beginning and so these are our training samples and we initialize our weight so let's say for example we say our w is one in the beginning and then we do the forward pass so here at the first node we multiply x and w so we get y hat equals one then at the next note we do a subtraction so y hat minus y is 1 minus 2 equals minus 1 and at the very end so we square our s so we have have s squared so our loss then is 1. and now we calculate the local gradient so at the last node we have the gradient of the loss with respect to s and this is simple because we know the function so this is the gradient of s squared so this is just 2s and then at the next node we have the gradient of s with respect to y hat which is the gradient of the function y hat minus y with respect to y hat which is just one and then here at the last node we have the derivative of y hat with respect to w so this is the derivative of w times x with respect to w which is x and also notice that we don't need to go don't need to know the derivatives in this graph lines so we don't uh need to know what is the derivative of s with respect to y and also here we don't need this because our x and our y are fixed so we are only interested in our parameters that we want to update here and yeah and then we do the backward pass so first now we use our local gradients so we want to compute the derivative of the loss with respect to y hat and here we use the chain rule with our two local gradients that we just computed which is two s times one and s is minus 1 which we calculated up here and then so this is minus 2 and now we use this derivative and also this local gradient to then get the final gradient the gradient of the loss with respect to our w which is the gradient of the loss with respect to y hat times the gradient of y hat with respect to w which is minus two times x and x is one so the final gradient is minus two so this is the final gradient then that we know want to know and yeah that's all how backpropagation works and let's jump over to our code and verify that pytorch get these exact numbers so let's remember x is one y is two and w is one and then our first gradient should be minus two so let's see how we can use this in pi torch and first of all we import torch of course then we create our vector our tensor so we say x equals torch dot tensor and this is one and then our y equals torch dot tensor with two and then our initial weight is a tensor also with one so 1.0 to make it a float and here in with our weight we are interested in the gradient so we need to specify require scrut equals true and then we do the forward pass and gets and compute the loss so we simply say y hat equals w times x which is our function and then we say loss equals y hat minus the actual y and then we square this so we say this to the power of two and now let's print our loss and see um this is one in the beginning and now we want to do the backward pass so let's do the backward pass and pi touch will compute the local gradients automatically for us and also computes the backward pass automatically for us so the only thing that we have to call is say loss backward so this is the whole gradient computation and now our w has this dot grad attribute and we can print this and now this is the first gradient in the after the first forward and backward pass and remember this should be -2 in the beginning and here we see we have a tensor with -2 so this is working and the next steps would be for example now we update our weights and then we do the next forward and backward pass and do this for a couple of iterations and yeah that's how backpropagation works and how and also how easy it is to use it in pytorch and i hope you enjoyed this tutorial please subscribe to the channel and see you next time bye hi everybody welcome to a new pie torch tutorial in this tutorial i show you a concrete example of how we optimize our model with automatic gradient computation using the pi torch autograph package so we start by implementing the linear regression algorithm from scratch where we do every step manually so we implement the equations to calculate the model prediction and the loss function then we do a numerical computation of the gradients and implement the formula and then we implement the gradient decent algorithm to optimize our parameters when this is working we see how we can replace the manually computed gradients with the automatic back propagation algorithm from pi torch so this is step number two and in the third steps we in the third step we replace the manually computed loss and parameter updates by using the loss and optimizer classes in pi torch and in the final step we replace the manually computed model prediction by implementing a pi torch model so when we understood each of these steps pytorch can do most of the work for us of course we still have to to design our model and have to know which loss and optimizer we should use but we don't have to worry about the underlying algorithms anymore so now this video will cover steps one and two and in the next video we will see the steps three and four so let's start and i assume that you already know how linear regression and gradient decent works and if not then please watch my machine learning from scratch tutorial about this algorithm because now i will not explain all the steps in detail but i put the link in the description so now we do everything from scratch so we use only numpy so we import numpy s and p and then we use linear regression so we use a function which just just does a linear combination of some weights and our inputs and we don't care about the bias here so in our example let's say f equals two times x so our weight must be two and then let's do some training samples so let's say x equals numpy dot array and then we put some tests or training samples so let's say one two three and four and this will be of numpy now or let's get give this a data type on say this is numpy dot float 32 and then we also want a y and since our formula is this is 2x we have to multiply each of the values by 2 so 2 4 6 and 8. and now we initialize our weights so we simply say w equals zero in the beginning and now we have to um calculate our model prediction and we also have to calculate the loss and then we have to calculate the gradient so now we do each of these steps manually so let's define a function and we call this forward so this is a forward pass to follow the conventions of pi torch which will get x and then our model output is simply w times x so this is the forward pass um now the loss so here we define the function loss which depends on y and y predicted so the this is the model output um and now here in this case this is the or the loss equals the mean squared error in the case of linear regression and we can calculate this by saying this is um let's say y predicted minus y and then to the power of two and then we do the mean operation so this is the loss and now we manually have have to calculate the gradient of the loss with respect to our parameters so let's have a look at the mean squared error so the formula is 1 over n because it's the mean and then we have our y w times x so our prediction minus the actual value to the power of two and now if you want to have the derivative so the derivative of this let's call this j or objective function with respect to w equals 1 over n and then we have two times x and then times w times x minus y so this is the numerical computated computed derivative so please double check the math for yourself and now we implement this so we say define gradient which is dependent on x and y and also y predicted and now we can do this in one line so we return numpy dot we need a dot product of two times x and then here we have y predicted minus y so this is this formula here and then of course we also need the mean so let's say this is dot mean we can simply do it like this in numpy and now yeah these are the things we need now let's print our prediction before the training so let's print and we use an f string so prediction before training and let's say we want to predict the value 5 which should be 10 and here we can do in the f string we can actually use an expression so we can call this forward method and with five and let's say we only want three decimal values and now let's start our training so let's define some parameters so we need a learning rate which is let's say point zero one and then we need a number of iterations so we say n iters equals 10 and now let's do our training loop so we say for epoch in range and it errors and then first we do the prediction which is the forward pass so this is the forward pass um and we can simply do this with our function so we say y prediction or y pratt equals forward and then we put in our capital x and now we want to have the loss so our loss l equals the loss of um the actual y and our y predicted now we need to get the gradients so our gradients with respect to w so dw equals the gradient function that we just implemented which is dependent on x and y and the y predicted sorry y pratt and now we have to update our weights and yeah so the update formula in the gradient descent algorithm is just we go into the negative direction of the training of the gradient so minus x and then here the step width or the so-called learning rate times our the our gradient so this is the update formula and then let's say we also want to print some information here so we say if epoch modulo just say one here because now we want to print every step um if this is zero we want to print um let's say we want to print the epoch um and here we print epoch plus one and then we want to get the weight which is um [Music] the weight w um also just three decimal numbers and then we also want to have the loss so the loss equals the loss and here we say point eight let's say and yeah and then at the end we want to print the prediction after the training um so now let's predict predict a print prediction after training and yeah so now let's run this and see what happens so everything should be working and now so yeah before our training the prediction is zero and then for each step remember that our formula should be 2 times x so our w should be 2 in the beginning and we see that with each training step it it increases our weights and it decreases our loss so it gets better with every step and after the training our model prediction is 9.999 so it's almost there so let's say for example now we want to have more iterations here say we only did we only did 10 iterations which is not much now if we run this and let's print every second step only then we see in the end our loss is zero and the prediction is correct now this is the implementation where we did everything manually and now let's replace the gradient calculation so let's select all of this and copy this into a separate file and now we don't use numpy anymore so now let's only import torch and do everything with pi torch and of course what we now want to get rid of is this gradient the the manually computed gradients so we simply delete this we don't need this anymore and now we don't have numpy arrays so this is now a torch dot tensor and our data type is now a torch dot float 32 and the same with our y which is now a torch dot tensor and also the data type is from the torch module but everything else is the same here so the same syntax and now our w also has to be a tensor so let's say this is a torch dot tensor um with zero point zero in the beginning and it also gets a data type of say torch dot float 32 and since we are interested in the gradient of our loss with respect to this parameter we need to specify that this requires the gradient computation so require scrat equals true now the forward function and the loss function is still the same because we can use the same syntax in pi torch and now in our training loop the forward pass is still the same the loss is the same and now the gradient this is the equal to the backward pass so remember in back propagation we first do a forward pass that's why we use the syntax and then later for the gradients we use the backward pass so here we simply call l dot backward and this will calculate the gradient of our loss with respect to w so pi touch does all the computations for us and now we update our weights but here we want to be careful and i explained this in the tutorial about the autograph package because we don't want to be this operation should not be part of our um gradient tracking graph so this should not be part of the computational graph so we need to wrap this in a with torch dot no grad statement and then one more thing that we should also know and i also talked about this already we must empty or zero the gradients again because whenever we call backward it will write our gradients and accumulate them in the w dot grad attribute so before the next iteration we want to make sure that our gradients are zero again so we can say w times graph times zero underscore so this will modify it in place and now we are done and now let's run this and see if this is working and w is not defined um oh yeah okay of course this is now w dot grad and let's try this again and now it's working and now we also see that it will increase our w and it will decrease our loss and here we said we had 20 iterations but it's not correct not completely correct and this is because the backward or the back propagation is not as exact as the numerical gradient computation so let's try some more iterations here let's say we want 100 iterations and print our update every 10th step so let's try this again and now we see after the training is done we have the correct correct prediction so yeah that's it for this video and in the next video we will continue here and replace the manually computed loss and weight updates with pytorch loss and optimizer classes so if you like this video please subscribe to the channel and see you next time bye hi everybody welcome back to a new pie torch tutorial in the last tutorial we implemented logistic regression from scratch and then learned how we can use pytorch to calculate the gradients for us with back propagation now we will continue where we left off and now we are going to replace the manually computed loss and parameter updates by using the loss and optimizer classes in pi torch and then we also replace the manually computed model prediction by implementing a pi torch model then pitoch can do the complete pipeline for us so this video covers steps three and four and please watch the previous tutorial first to see the steps one and two so now let's start and first i want to talk about the general training pipeline in pytorch so typically we have three steps so the first step is to design our model so we design the number of inputs and outputs so input size and output size and then also we design the forward pass with all the different operations or all the different layers then as a second step we design or we come up with the so we construct the loss and the optimizer and then as a last step we do our training loop so this the training loop so we start by doing our forward pass so here we compute or let's write this down compute the prediction then we do the backward pass backward pass so we get the gradients and pi torch can do everything for us we only have to define or to design our model so and after we have the gradients we can then update our weights so now we update our weights and then we iterate this a couple of time until we are done and that's the whole pipeline so now let's continue and now let's replace the loss and the optimization so for this we import the neural network module so we import torch dot n n s n n so we can use some functions from this and now we don't want to define the loss manually anymore so we can simply delete this and now um down here before our training we still need to define our loss so we can say loss equals and here we can use a loss which is provided from pi torch so we can say nn dot mse loss which is exactly what we implemented before so this is the mean squared error and this is a callable function and then we also want a optimizer from pi charge so we say optimizer equals torch dot optim from the optimization module and then here we use sgd which stands for stochastic gradient descent which will need some params some parameters that it should optimize and it will need this as a list so we put our w here and then it also needs the lr so the learning rate which is our previously defined learning rate and then in our training loop um so the loss computation is now still the same because this is a callable function which gets the actual y and the predicted y and then we don't need to manually update our weights anymore so we can simply say optimizer dot step which will do an optimization step and then we also still have to empty our gradients after the optimization step so we can say optimizer dot zero grad and now we are done with step three so let's run this to see if this is working and so yeah it's still working our prediction is good after the training and let's continue with step four and replace our manually implemented forward method with the with a pi torch model so um for this um let's we also don't need our weights anymore because then our pie torch model knows the parameters so um here we say model equals nn dot linear so usually we had have to design this for ourself but since this is very trivial for linear regression so this is only one layer this is already provided in pi torch so this is nn.linear and this needs an input size and an output size of our features and for this we need to do some modification so now our x and y need to have a different shape so this must be a 2d array now where the number of rows is the number of samples and for each row we have the number of or the not the features so this has a new shape um sorry a new shape um [Music] that looks like this and the same for our y so our y is the same shape now so 2 4 6 and 8 so now let's um get the shape so this is why be careful now so we can say number of samples and number of features equals x dot shape and now let's print this so print the number of and the number of features and now let's run this so this will run into an error but i think we get until here so the shape is now four by one so we have four samples and one feature for each sample and now we define our models so this needs an input and an output size so the input input size equals the number of features and the output size output size is still the same so this is also the number of features so this is one as an input size and one as an output size now we need to give this to our model so we say here input size and output size and then one more or then when we want to get the prediction we can simply say we can call them model um but now this cannot have a float value so this must be a tensor so let's create a test tensor let's say x test equals torch dot tensor which gets only one sample with five and then it gets a data type of say torch dot float 32 and then here we passed the test sample and since this is only one well has only one value we can call the dot item to get the actual float value then so now let's copy and paste this down here um and now we also have to modify our optimizer here so we don't have our weights now so this list with the parameters here we can simply say model dot parameters and call this function and now here for for the prediction we also we simply call the model and now we are done so now we are using the pi torch model to get this and also down here now if we want to print them again we have to unpack them so let's say w and an optional bias equals model parameters this will unpack them and then if we want to print the actual this will be a list of lists so let's get the first or the actual first weight with this and we can also call the item because we don't want to see the tensor and now i think we are done so let's run this to see if this is working and yeah so the final output is not um perfect so this might be because the initialization now is randomly and also this optimizer technique might be a little different so you might want to play around play around with the learning rate and the number of iterations but basically it works and it gets better and better with every step and yeah so this is how we can construct the whole training pipeline and um one more thing so in this case um we didn't have to have to come up with the model for ourselves so here we only had one layer and this was already provided in pi torch but let's say we need a custom model so let's write a custom linear regression model then we have to derive this from nn dot module and this will get a init method which has self and which gets the input dimensions and the output dimensions and then here we call super the super class so super um of linear regression with self and then dot init this is how we call the super constructor and here we would define our layers so in this case we say our self dot lin or linear layer equals nn dot linear and this will get the input dimension and the output dimension and then we store them here and then we also have to implement the forward pass in our model class so itself and x and here we can simply return self dot linear of x and this is the whole thing and now we can say our model equals uh linear regression with the input size and the output size and now this will do the same thing so now this is just a dummy example because this is a simple wrapper that will do exactly the same but basically this is how we design our pi touch model so now let's comment this out and use this class to see if this is working and yeah so it's still working so that's all for now and now pi touch can do most of the work for us of course we still have to design our model and have to know which loss and optimizer we want to use but we don't have to worry about the underlying algorithms anymore so yeah you can find all the code on github and if you like this please subscribe to the channel and see you next time bye hi everybody welcome back to a new pie torch tutorial this time we implement linear regression so we already implemented this step by step in the last couple of tutorials and this should be a repetition where we can apply all the learned concepts and quickly implement our algorithm again so as i've shown you before our typical pytorch pipeline consists of those three steps first we design our model so we define the input and the output size and then the forward pass then we create our loss and optimizer functions and then we do the actual training loop with the forward pass the backward pass and the weight updates so let's do this and first of all we import a couple of things that we need so let's import torch then we import torch dot nn n n so the neural network module then we import numpy as np just to make some data transformations and then from sk learn we import data sets so we want to generate a regression data set and then we also want to plot this later so i say import matplotlib dot pi plot as plt and then we do our three steps so we design the model step number one then step number two we define the loss and the optimizer and then step number three our training loop so let's do this and first of all let's do a step 0 where we prepare our data so prepare data so let's generate a regression data set and we can do this by saying let's call this x numpy and y numpy equals and then we can use data sets dot make regression which gets let's say 100 samples so n samples equals 100 and only one feature in this example so n features equals one then we add some noise and let's also add a random state let's say this is one and then we want to convert this to a torch tensor so we say x equals and then we can use the function torch dot from numpy and then we say x dot x underscore numpy but we want to convert this to a float32 data data type before so right now this is a double data type so if we use a double here then we will run into some errors later so let's just convert this by saying s type and then say numpy dot float 32 and we do the same thing for our y so we say y equals the torch tensor from our numpy array and now let's also reshape our y because right now this is a has only one row and we want to make it a column vector so we want to put each value in one row and the whole shape has only one column so let's say y equals y dot view and here we put in the new size so y dot shape zero so the number of values and then only one column so the view method is a built-in pi torch method which will reshape our tensor and then let's get the number of samples and the number of features by saying this is x dot shape so we can use this in a second and now let's do our three steps so now we have the data now we define the model and in the linear regression case this is just one layer so our so we can use to build in linear model and so we say model equals n n dot linear and this is the linear layer which needs a input size of our features and a output size so let's say input size equals this is the number of features we have so this is just one in our example and the output size equals one so we only want to have one value for each sample that we want to put in so our model gets now the input and the output size so input size and output size and this all we have to do to set up the model and now let's continue with the loss and the optimizer so let's call this criterion and here we can use a built-in loss function from pi torch and in the case of linear regression this is the mean squared error so we can say this is nn dot mse loss this will calculate the mean squared error so this is a callable function and then we also set up the optimizers so we say optimizer equals and let's say torch dot optim dot sgd so this is stochastic gradient descent and our optimizer needs the parameters that it should optimize so here we can simply say this is model dot parameters and then it needs a learning rate so let's define this here as a variable so let's say learning rate equals let's say 0.01 and then lr equals learning rate so this is step number two and now let's do our training loop so first of all let's define the number of epochs let's say we want to do 100 training iterations and now for epoch in range num epochs and now here we do our steps in the training loop the forward pass the backward pass and the update and the weight updates so let's first of all do the forward pass and also the loss here then the backward pass and then the update so the forward pass and the loss here we can say y predicted equals and here we call our model and as a data it gets x and so this is the forward pass and then we compute the loss by saying loss equals this is our cri we call this criterion and this needs the actual labels and the predicted values so why predicted and why and now in the backward pass to calculate the gradients we just say loss dot backward so this will do the back propagation and calculate the gradients for us and then our update here we simply say optimizer dot step so this will update the weights and then before the next iteration we have to be careful so we have to empty our gradients now because whenever we call the backward function this will sum up the gradients into the dot grad so now we want to empty this again and we simply say optimizer dot zero grad so you should never forget this and then we are done with training loop let's also print some information so let's say if epoch plus one modulo tens equals equals zero so every tenth step we want to print some information so let's print the epoch um and here we say epoch plus one and let's also print the loss the loss equals and here we can can say loss dot item and let's format this so let's uh plot or print only four decimal values so now we are done and now let's also plot this so let's say let's get all the predicted values by saying predicted equals here we call our final model now model x and with all the data and now we want to convert this to numpy back again but before we do that we want to detach our tensor so we want to prevent this operation from being tracked in our graph in our computational graph because right now this tensor um here i have a typo predicted so this tensor has the required gradients argument set to true but now we want this to fall to be false so this will generate a new tensor where our gradient calculation attribute is false so this is our new tensor and then we just call the numpy function now we convert it to numpy and now plot this so let's say first plot all our data so x numpy and y numpy and we want to plot this as let's say red dots and then we want to plot our generated or approximated functions so let's say plt dot plot x numpy on the x-axis and our predicted labels on the y-axis and let's plot this in blue and then we say plt dot show and now let's run this and hope that everything is correct and now this plot appears here so now we see that we have a pretty good approximation of our data with this line and we see that this is working and yeah now we're done i hope you enjoyed this if you like this please subscribe to the channel and see you next time bye hi everybody welcome back to a new pie torch tutorial this time we implement logistic regression if you've watched the previous tutorials then this should be very easy now once again we implement our typical pytorch pipeline with those three steps so first we set up our model we define the input and output size and the forward pass then we create the loss and the optimizer functions and then we do the actual training loop with the forward pass the backward pass and the weight updates the code here should be very similar to the code in the last tutorial where we implemented linear regression we only have to make slight adjustments for the model and the loss function so we add one more layer to our model and we select a different loss function from pytorch spilled in functions so let's start first of all let let's import some things that we need so we import torch of course and we import torch dot n n s n n so the neural network module then we import numpy s and p to make some data transformations then from s k learn we import data sets to load a binary classification data set then from sk learn dot pre-processing we want to import standard scala because we want to scale our features and then from sklearn dot model selection we import train test split because we want to have a separation of training and testing data and now let's do our three steps so first we want to set up the model then we want to set up the loss and the optimizer and then in the third step we do the actual training loop and as a step 0 we want to prepare the data so let's do this so let's load the breast cancer data set from sklearn so we can say bc equals data sets dot load breast cancer this is a binary classification problem where we can predict concept based on the input features so let's say x and y equals bc dot data and bc dot target and then we want to say well first of all let's get the get the number of samples and the number of features by saying this is x dot shape um so let's print this first so print the number of samples and the number of features to see how our data set looks like and we see we have 569 samples and 30 different features so a lot of features here and now let's continue and let's split our data when we say x train and x test and next test and y train and y test equals here we can use the train test split function where we put in x and y and we want to be the test size to be 20 so this is 0.2 and let's also give this a random state equals let's say one two three four and this should be a small s and now let's convert or first of all now we want to scale our features scale them here we set up a standard scalar sc equals standard scalar which will make our features to have zero mean and unit variance this is always recommended to do when we want to deal with a logistic regression so now we scale our data so we say x train equals sc dot fit transform and then as an input we put in x train and then we want to do the same thing with our test data so we say x test equals sc dot here we only transform it um [Music] and here we put in x test now we scaled our data now we want to convert it to uh torch tensors so let's say x train equals torch dot and then here we can use the function from numpy and then we put in x train and cast this to a float32 data type so we say x train dot s type numpy dot float 32 because right now this is of type double and then we would run into some errors later so let's cast this and convert this to a tensor and now let's do this with all the other arrays so let's say x test equals um this and our y train and also our y test tensor um y test and now as a last thing to prepare our data is to reshape our y uh tensors so y train equals y train dot view this is a built in function from pi torch that will reshape our tensor with the given size so it gets the size y train um dot shape zero and one so right now our y has only one row and we want to make it a column vector so we want to put each value in one row with only one column so this will do exactly this and also for our y test so y test equals this y test and now we are i think we are done with our data preparing so now let's set up our model and here our model is a linear combination of weights and a bias and then in the logistic regression case we apply a sigmoid function at the end so let's do this and for this we want to write our own class so let's call this model or we can also call this logistic regression just tick regression [Music] and this must be derived from n and dot module and then this will get a in it which has self and then it gets the number of input features and here first we call the super in it so let's say super logistic regression and self dot in it and then here we define our layer so we only have one layer self.linear equals and here we can use the build in layer n n dot linear and this gets the input size so and input features and the output size is just one so we only want to have one value one class label at the end and then we also have to implement the forward pass here which has self and the data and our forward pass is first we apply the linear layer and then the sigmoid function so here we say y predicted equals torch dot sigmoid so this is also a built-in function that we can use and here we apply our self.linear layer so linear layer with our data x and then we return our y predicted so this is our model and now let's create this so model equals logistic regression of size and here we put in the number of features that we have so now our layer is of size 30 by one and no sorry 30 input features and one output feature and now we have our model and now we can continue with the loss and the optimizer so for a loss the loss function now is different than in the linear regression case so here we say criterion equals nn dot bce loss so the binary cross entropy loss here and our optimizer is the same so this can be um this is torch dot optim dot s g d for stochastic gradient descent and this gets some parameters that we want to optimize so here we just say model dot parameters and it also needs a learning rate so let's say our learning rate equals 0.01 and then here we say lr equals learning rate so now this is step two and now step three so let's define some number of epochs equals let's say 100 iterations and now we do our training loop so now we do for epoch in range num epochs and then first we do the forward pass forward pass and the loss calculation then we do the backward pass and then we do the updates so let's say y predicted equals here we call our model um and as data it gets x train and then we say loss equals criterion and this will get a the y predicted and the actual y training so the training samples or the training labels and now we do the backward pass and calculate the gradients and again we simply have to call lost at dot backward and pi torch will do all the calculations for us and now we update our weights so here we simply have to say optimizer dot step and again pi torch will do all the update calculations for us and then we don't or we must not forget to empty our gradients again so one two zero the gradients because the backward function here will always add up all the gradients into the dot grad attribute so let's empty them again before the next iteration and we simply must say optimizer dot zero grad and then let's also print some information if epoch plus one modulo 10 equals equals zero so every tenth step we want to print some information let's use an f string here um let's say epoch and here we can use epoch plus one and then we also want to see the loss so the loss equals loss dot item and let's format this to say to only print four decimal values and yeah so now we are done this is our logistic regression implementation and now let's evaluate our model so the evaluation should not be part of our computational graph where we want to track the history so we want to say with torch dot no great and then do our evaluation here so here i want to get the accuracy so let's get all the predicted classes from our test um samples so let's say this is model and here we put in x test and then let's convert this to class labels so zero or one so remember the sigmoid function here will return a value between zero and one and if this is larger than point five we say this is class one and otherwise it's class zero so let's say y predicted classes equals y predicted dot round so here we can use a built in function again and this will do exactly this and yeah so if we do don't use this statement then this would be part of the computational graph and it would track the gradient calculations for us so here we don't want this we don't need this because we are done so that's why we use this with statement here and now let's calculate the accuracy by saying ack equals y predicted classes and here we can call the equal function equals y test and then the sum so we want to sum them up for every prediction that is correct it will uh add plus one and then we divide this by the number of samples of test samples so y test dot shape zero this will return the number of test samples and then let's print our accuracy print accuracy equals um ack dot 0.4 f also only for decimal values and now let's run this and hope that everything is correct and standard scalar has no attribute transform so here i have a typo transform now let's run this again transf form one more try and now we are done and we have a accuracy of 0.89 so it's okay it's good but it's not perfect so you might want to play around with for example the number of iterations and where do we have it the number of epochs or the learning rate for example or also maybe try out a different optimizer here but basically that's how we implement logistic regression i hope you liked it if you liked it please subscribe to the channel and see you next time bye hi everybody welcome back to a new pie torch tutorial today i want to show you the pythog data set and data loader classes so far our code looked something like this so we had a data set that we loaded somehow for example from a csv file and then we had our training loop that looped over the number of epochs and then we optimized our model based on the whole data set so this might be very time consuming if we did gradient calculations on the whole training data so a better way for large data sets is to divide the samples into so-called smaller batches and then our training loop looks something like this so we loop over the epochs again and then we do another loop and loop over all the batches and then we get the x and y batch samples and do the optimization based only on those batches so now if we use the built-in data set and data loader classes from pytorch then pie charts you can do can do the batch calculations and iterations for us so it's very easy to use and now i want to show you how we can use these classes but before we jump to the code let's quickly talk about some terms when we talk about batch training so first one epoch means one complete forward and backward pass of all the training samples and one the batch size is the number of training samples in one forward and one backward pass and the number of iterations is the number of passes where each pass uses the batch size number of samples so here we have an example if we have 100 samples and our batch size is 20 then we have five iterations for one epoch because 100 divided by 20 is 5. so yeah that's what we should know and now let's jump to the code so first i already implemented some modules that we need so torch of course then also torch vision and then from torch.utils.data we import dataset and data loader so the classes i just talked about then let's also import numpy and math and now we can start implementing our own custom data set so let's call this swine data set and this must inherit data set and then we have to implement three things so we have to implement the init with self so here we do some data loading for example and then we also must implement the double underscore get item method which gets self and an index so this will allow for indexing later so we can call data set with an index 0 for example and then we also must implement the lang method which only has self and then this will allow that we can call length of our data set so now let's start so in our case we want to look at the wine data set so i have the csv file here and i also put this in my github repository so you can check that out here and so the data set looks like this so the first row is the header and here we want to calculate or to predict the wine category so we have three different wine categories one two and three and the class label is in the very first column and then all the other columns are the features so let's load this and split our uh columns into x and y so here we can say x y equals numpy dot load txt and here i must specify the file name so this is in the data folder and then i have a wine folder and then it's called wine.csv then let's also give a delimiter equals a comma here because this is a comma separated file then let's also give it a data type and so let's say data type equals numpy dot float32 and let's also say skip rows equals one so we want to skip the first row because this is our header and now let's split our whole data set into x and y so we say self.x equals and here we can use slicing so x y and we want to have all the samples and then we don't want the very first column so we want to start at the column number one and then go all the way to the end so this will give us the x and then self dot y equals x y off and here again we want all the samples but only the very first column and we put this in another array here so that we have the size number of samples by one so this will make it easier later for some calculations um so yeah and that's also convert this to a tensor so we can say torch dot from numpy and then give this to our um our to the function here so torch dot from numpy so we don't uh we do not need this but we can do it we can also convert it later but we can do it right here so let's do this and let's also get the number of samples so let's say self dot number of samples equals x y dot shape and then zero so the first dimension is the number of samples and then we can return this right here and this is our whole length function so return self dot number of samples and here we can also implement this in one line so we can say return self dot x of this index and then self dot y of this index so this will return a tuple and yeah now we are done so this is our data set that we just implemented and now let's create this data set so let's say data set equals one data set and now let's have a look at this data set so now we can say first data equals data set and now we can use indexing so let's have a look at the very first sample and now let's unpack this into features and labels like this so this is first data and now let's print the features and also print the labels to see if this is working and yeah so we have one uh feature column or only one row so this is one row vector and then the label so the label one in this case and yeah so this is how we get the data sets and now let's see how we use a data loader so we can say data loader equals the built-in data loader class and then we pass we say data set equals this data set and then we can also give this a batch size so batch size equals let's say 4 in this case then let's say shuffle equals true which is very useful for training and so this will shuffle the data and then we also say num workers equals two so you don't need to do this but um this might make loading faster because it's using multiple sub processors now and yeah so now let's see how we can use this data loader object so now we can convert this to a iterate iterator so let's say data iter equals um iter data loader and then we can call the next function so we can say data equals data editor dot next and then we can all again unpack this by saying features and labels equals data and now let's print the features and the labels if to see if this is working and yeah so here we have it and here in this case i specify specified the batch size to four this is why we see four different feature vectors here and then also for each feature vector the class so four class labels in our labels vector or labels tensor and now we also can iterate over the whole data loader so and not only get the next item so now let's do a dummy training loop so let's specify some hyper parameters so let's say num epochs equal epochs equals two and then let's get the total number of samples so total samples equals length of our data set and now let's get the number of iterations in one epoch so this is um the total number of samples divided by the batch size so divided by four and then we also have to to seal this so math see um this and now let's print our total samples and the number of iterations and then we see we have 178 samples and 45 iterations so now let's do our loop so let's say for epoc in range number of epochs and now we do the second loop and loop over the train loader so let's say for i and here we can um already unpack this by saying inputs and labels in enumerate and here we only put in the how do we call it data loader so this is all we have to do and now this enumerate function will give us the index and then also the inputs and the labels here which is already unpacked um and now what we should do typically in our training is to do our forward and then our backward pass and then update our weights so this is just a dummy example so in this case i only want to print some information about our batch that we have here so let's say if i plus 1 modulo 5 equals equals zero so every fifth step we want to print some information so let's print epoch and here let's print the current epoch and then all um epochs so here let's say num epochs and then let's also print the current step so step and this is i plus 1 and then the total steps so this is n iterations here and then let's also print some information about our input so inputs and let's say here we want to print inputs dot shape only and yeah now let's run this to see if this is working and yeah so here we see our print statements so we see that we have two epochs and in every epoch we have 45 steps and every fifth step we print some information and we also see that our tens are is four by 13 so we have our batch size is four and then 30 different features in each batch and yeah so that's how we use the data set and the data loader classes and then we can very easily get a single batch single batches and yeah of course uh pytorch also has some already built in data sets so for example from torch vision dot data sets dot mnist we get the famous mnist data set and for example we can also get the fashion mnist data set um or the sci-fi data set or the cocoa data set and yeah so the mnist data set is one that we will use in one of the next tutorials and for now this is what i wanted to show you about the data set and data loader classes i hope you liked it and please subscribe to the channel and see you next time bye hi everybody welcome back to a new pie torch tutorial this time we want to talk about transforms for our data set in the last tutorial we used the built in data set and data loader classes and if we use a built-in data set like we see here and we see that we can pass the transform argument to this data set and then apply some transforms so in this example we use the built in m this data set and then we apply that to tensor transform which will convert images or numpy arrays to tensors and pytorch already has a lot of transforms implemented for us so please have a look at the official documentation which you can find at this link and there you can see all the available transforms and for example there are transforms that can be applied to images for example center crop or grayscale or padding and then there are transforms that can be applied to tensors like the linear transformation or normalize then there are conversion transforms for example that to pillow image and the tensor transform then there are also generic transforms so we can use lambdas or we can write our very own custom class and then we can also com compose multiple transforms so we can use transforms.compose and then pass in a list which will apply multiple transforms after each other and yeah so in the last tutorial we implemented a custom wine data set now let's extend this class to support our transforms and write our own transform classes so let's start and here i copied um the code from the last tutorial where we have our own uh custom wine data set which will load the data and then we implemented the get item and the lang method which will allow indexing and the length function so let's extend this data set so now this should also support the transform arguments so we put this in our init and say transform oh sorry transform equals so this is optional so by default this is none and then in the init we store this so we say self.transform equals transform and now we also have to make some changes to our get item function so here we want to apply a transform if it's available so let's say here um sample equals this and then we say if self dot transform so if this is not none then we apply this so we say sample equal self dot transform our sample and then we simply return our sample so let's return sample and this is all the change that we need for our data set and now let's continue and let's um create some custom transform classes for example we can write our own to tensor class so in the last tutorial we already converted it to a tensor right here in this step but we don't need to do this so we can leave this as a numpy array and then let's implement a to tensor class which will then be passed to our data set and which will then later um um convert this to a tensor so the class two tensor and the only thing that we need is that we need to implement is the double underscore call method which will get self and a sample so now this is a callable object and what we uh do here is first we unpack our samples so we say inputs and labels or targets equals sample and then we say return torch dot from numpy and here inputs and then also torch dot from numpy targets so here also we return we still want to return a tuple like we did here and this is all that we need for our to tensor transform and now we can pass this in here so now we can say our wine data set gets the transform transform equals to tensor which is a function and now let's have a look at this so let's get the first item so let's say first data equals data set of index zero and then let's unpack our data so [Music] first data so let's say features and labels equals first data and now let's print the type of the features and also the type of the labels so now if we run this then we should see this is now of class torch tensor and if we don't pass this in here so if you say this is none no transform then we see that it's still a numpy nd array so this is how we write our own tensor our own transform and then apply it for our own data set and now let's write for example another custom transform so let's call this mal transform so a multiplication and here we imp uh implement the init method so this has self and this has a factor argument so here we store this self dot factor equals factor and then again we must implement the double underscore call function or call method which gets self and the sample and here again let's unpack our samples so let's say inputs and inputs and target equals sample and then let's only apply the factor to our features so let's say inputs um time is multiplied by our self dot factor and now let's return our inputs our modified inputs and our target still as a tuple and so this is the multiplication transform and now let's apply this so let's apply a let's say a compose transform in this case to see how we can use this so let's say composed equals and here we need torch vision dot transforms dot compose and here we put in a list of our transforms so here first we want to have to or and then we want to have mild transform and let's say so this needs a factor so let's say multiplied by two and now let's create a new data set equals wine data set which gets the transform equals our compose transform compost and now again let's get this so get or let's just copy this from here and run this to see if this is working so now here we have a tensor and let's also have a look at the so let's print the features and here also print the features to see if the multiplication got applied so here now we should see that each value got doubled and now let's use another factor so let's multiply it by four and run this and now we should see that each of the value should now be multiplied by four and yeah so this is how we can use the transform for our data sets and it's very useful yeah most of the time you see the conversion transformed to tensor but also a lot of times when we work with images you might see some of them so yeah please check that out on the documentation website and i hope you like this tutorial please subscribe to the channel and see you next time bye hi everybody welcome back to a new pie torch tutorial this time we talk about the soft mux function and the cross entropy loss these are one of the most common functions used in neural networks so you should know how they work now i will teach you the math behind these functions and how we can use them in numpy and then pi torch and at the end i will show you how a typical classification neural network with those functions look like so let's start and this is the formula of the soft mux so it applies the exponential function to each element and normalizes it by dividing by the sum of all these exponentials so what it does it basically squashes the output to be between 0 and 1 so we get probabilities so let's have a look at an example let's say we have a linear layer which has three output values and these values are so-called scores or logits so they are raw values and then we apply the softbox and get probabilities so each value is squashed to be between zero and one and the highest value here gets the highest probability and yeah if we sum these three probabilities up then we get one and then this is our prediction and then we can choose for the uh class with the highest probability so yeah that's how the softmax works and now let's have a look at the code so here i already implemented it in numpy so we can calculate this in one line so first we have the exponential and then we divide by the sum over all these exponentials and now let's run this this has the same values as in my slide and then here we also see that the highest value the highest logit has the highest probability um i rounded them them in my slides so it's slightly different but basically we see that it's correct and of course we can also calculate it in pi torch and for this we create a tensor so let's say x equals torch dot tensor and it gets the same values as this one and then we can say outputs equals torch dot soft max of x and we also must specify the dimensions so we say dim equals zero so it computes it along the first axis and now let's print these outputs so yeah here we see that the result is almost the same so this works and now let's continue so a lot of times the soft max function is combined with the so-called cross-entropy loss so this measures the performance of our classification model whose output is a probability between 0 and 1 and it can be used in multi-class problems and the loss increases as the predicted probability diverges from the actual label so the better our prediction the lower is our loss so here we have two examples um so here this is a good prediction and then we have a low cross entropy loss and here this is a bad prediction and then we have a high across entropy loss and what we also must know is that in this case our y must be hot one hot encoded so let's say we have three pro three possible classes class zero one and two and in this case the correct label is the class 0 so here we must put a 1 and for all the other classes we must put a 0. so this is how we do one hot encoding and then for the predicted y we must have probabilities so for example we applied the soft max here before and yeah so now again let's have a look at the code how we do this in numpy so we can calculate this here so we have the sum over the actual labels times the log of the predicted labels and then we must put a -1 at the beginning and we can also [Music] normalize it but we don't do this here so we could divide it by the number of samples and then we create our y so as i said this must be one hot encoded so here we have other examples so if it it's class one then it must look like this for example and then down here we put our two predictions so these are now probabilities so the first one has a good prediction because also here the class 0 has the highest probability and the second prediction is a bad prediction so here class 0 gets a very low probability and class 2 gets a high probability and now then i compute the entropy the cross entropy and predict both of them so let's run this and here we see that the first prediction has a low loss and the second prediction has a high loss and now again let's see how we can do this in pi torch so um for this first we create the loss so we say loss equals n n from the nu torch nn module n n dot uh cross entropy loss and now what we must know let's have a look at the slides again so here we have to be careful because the cross entropy loss already applies the lock and then the negative lock likelihood loss so we should not or must not implement the softmax layer for ourselves so this is the first thing we must know and the second thing is that here um our y must not be one hot encoded so we should only put the correct class label here and also the why predictions um has raw scores so no soft marks here so be careful about this and now let's see this in practice so let's say let's create our actual labels and this is a torch.tensor and now here we only put the correct class label so let's say in this case it's class zero and not one hot encoded anymore and then we have a good prediction why prediction good equals torch dot tensor and then here we must be careful about the size so this has the size uh number of samples times the number of classes so let's say in our case we have one sample and three possible classes so this is an array of arrays and here we put in 2.0 1.0 and 0.1 and remember this these are the raw values so we didn't apply the softmax and here the highest or the the class 0 has the highest value so this is a good prediction and now let's make a bad prediction so prediction bad so here the very first value is a lower value let's say and the second value is high and let's change this also a little bit and now we compute our loss like this so now we call the loss function that we created here and then we put in the y prediction and the actual y and the same with our second let's compute a second loss with y prediction bad and y and now let's print them so let's print l1 dot item so it only has one value so we can call the item function and also l2 dot item so let's run this and yeah here we see that our good prediction has a lower cross entropy loss so this works and now to get the actual predictions we can do it like this so let's say underscore because we don't need this and then predictions pre dictions equals torch dot max and then here we put in the prediction so y prediction good and then along the first dimension and also the same with the bad one so let's call this prediction one and prediction two um and let's print our prediction so predictions 1 and print predictions 2 so this will here we see that we choose the highest probability so in this case we choose this one and in in the second case we choose this one so class number one here so this is how we get the predictions and what's also very good is that the loss in pi torch allows for multiple samples so let's increase our samples here so let's say we have three samples so three possible classes then our tensor must have three class labels our actual y so for example two zero and one and then our predictions must be here of size number of samples times the number of classes so now this is of size three by three so let's do this so here we must put in another list with three values so um like this and like this so let's say this one is a good prediction so the first class the first correct label is class number two so this one must have the highest value and this one must be low so let's say 0.01 and here the very first one uh the first class must have a high value so like this and then the value in the middle must have the highest raw value so for example like this and then [Music] we do the same for our bad prediction and let's say here we have this one higher and also change this a little bit and then we can again compute the cross the cross entropy loss with multiple samples and now let's run this and then we also see here again our first prediction is good and has a low loss and the second one is not so good and yeah here we get the correct predictions from the first prediction tensor so here we also have two zero one like in the actual y so yeah this is how we can use the cross entropy loss in pi torch and now let's go back to our slides so now i want to show you how a typical neural network looks like so here is a typical neural net in a multi-class classification problem so here we want to find out what animal our image shows so we have an input layer and then some hidden layers and maybe some activation functions in between and then at the end we have a linear layer with one output for each class so here we have two outputs and then at the very end we apply our soft max and get the probabilities so now as i said in pytorch we must be careful because we use the cross entropy loss here so we must not use the softmax layer in our neural net so we must not implement this for ourselves so let's have a look at how this code looks so in a multi-class classification our net for example looks like this so we define our layers so we have one linear layer which gets an input size and then a hidden size then we have a activation function in between and then our last layer it gets the hidden size and then the output size is the number of classes so for each possible class we have one output and then in the forward method so here we only apply our layers and then no soft max here at the very end and then we create our model and then we use the cross entropy loss which then applies to softmax so yeah be careful here and so this example also works for more classes so if our image could for example also be a bird or a mouse or whatever then this is also the correct layout but if we just have a binary classification problem with two possible outputs then we can change our layer like this so now we rephrase our question so we just say is it a dog yes or no and then here at the end we have a linear layer with only one output and then we do not use the softmax function but we use the sigmoid function which then gets a probability and if this is higher than 0.5 then we say yes and here in pi torch we use the bce loss or binary cross entropy loss so here we must implement the sigmoid function at the end so let's have a look at our neural net in a binary classification case so again here first we set up our layers and our activation functions and the last layer has the output size one so this is always fixed in this case and then in the forward pass now here after we applied our layers we also must implement the sigmoid function so yeah and then here as a criterion we use the binary cross entropy loss so be very careful here about these two different um different possible neural nets and yeah but that's basically what i wanted to show you um so the last structure is also what i used in the logistic regression tutorial so you can check that out if you haven't already and for now that's all i wanted to show you i hope you enjoyed it and understood everything if you have any questions leave them in the comments below and if you like this tutorial then please subscribe to the channel and see you next time bye everybody welcome to a new pie torch tutorial this time i want to talk about actuation functions activation functions are an extremely important feature of neural networks so let's have a look at what actuation functions are why they are used what different types of functions there are and how we incorporate them into our pi torch model so activation functions apply a linear transformation to the layer output and basically decide whether a neuron should be activated or not so why do we use them why is only a linear transformation not good enough so typically we would have a linear layer in our network that applies a linear transformation so here it multiplies the input input with some weights and maybe adds a buyers and then delivers the output and let's suppose we don't have actuation functions in between then we would have only linear transformations after each other so our whole network from input to output is essentially just a linear regression model and this linear model is not suited for more complex tasks so the conclusion is that with non-linear transformations in between our network can learn better and perform more complex tasks so after each layer we typically want to apply this activation functions so here first we have our normal linear layer and then we also apply this actuation function and with this our network can learn better and now let's talk about the most popular activation functions so the ones i want to show you is the binary step function the sigmoid function the hyperbolic tangent function the relu the leaky relu and the softmax so let's start with the simple step function so this will just output one if our input is greater than a threshold so here the threshold is zero and zero otherwise so this is not used in practice actually but this should demonstrate the example of if the neuron should be activated or not and yeah so a more popular choice is the sigmoid function and you should already know this if you've watched my tutorial tutorial about logistic regression so the formula is 1 over 1 plus e to the minus x and this will output a probability between 0 and 1 and this is typically used in the last layer of a binary classification problem so yeah then we have the hyperbolic tangent function or ton h this is basically a scaled sigmoid function and also a little bit shifted so this will output a value between minus one and plus one and this is actually a good choice in hidden layers so you should know about the ton h function then we have the relu function and this is the most popular choice in in most of the networks so the relu function will output 0 for negative values and it will simply output the input as output for positive values so it is actually a linear function for values greater than 0 and it is just zero for negative values so it doesn't look that much different from just a linear transformation but in fact it is non-linear and it is actually the most popular choice in the networks and it's typically a very good choice for an activation function so the rule of thumb is if you don't know which function you should use then just use a relu for hidden layers yeah so this is the relu very popular choice then we also have the leaky relu function so this is a slightly modified and slightly improved version of the relu so this will still just output the input for x greater than zero but this will multiply our input with a very small value for negative numbers so here i've written a times x for negative numbers and this a is typically very small so it's for example 0.001 and this is an improved version of the relu that tries to solve the so-called vanishing gradient problem because with a normal relu our values here are zero and this means that also the gradient later in the back propagation is zero and when the gradient is zero then this means that these weights will never be updated so these neurons won't learn anything and we also say that these neurons are dead and this is why sometimes you want to use the leaky relu function so whenever you notice that your weights won't update during training then try to use the leaky relu instead of the normal relu and yeah then as a last function i want to show you the softmax function and you also should already know this because i have a whole tutorial about the softmax softmax function so this will just this will basically squash the inputs to be outputs between 0 and 1 so that we have a probability as an output and this is typically a good choice in the last layer of a multi-class classification problem so yeah that's the different activation functions i wanted to show you and now let's jump to the code and see how we can use them in pytorch so we have two options and the first one is to create our functions as nn modules so in our network in the init function first we define all the layers we want to have so here for example first we have a linear layer and then after that we want to have a real actuation function so we create our relu module here and we can get that from the torch.n module so this contains all the different functions i just showed you and then we have the next layer for here example it's a next linear layer and then the next actuation function so here we have a sigmoid at the end and then in the forward pass we simply call all these functions after each other so first we have the linear the first linear layer which gets an output and then we use this output it and put it into our relu and then again we use this output and put it in the next linear layer and so on so this is the first way how we can use it and the second way is to use these functions directly so in the init function we only define our linear layers so linear one and linear two and then in the forward pass we apply this linear layer and then also call this torch dot relu function here and then the torch.sigmoid function directly so this is just from the torch api and yeah this is a different way how we can use it both ways will achieve the same thing it's just um how you prefer your code and yeah so all the functions that i just showed you you can get from the nn modules so here we have n n relu but we can for example also have an n dot sigmoid and we have n n dot soft marks and we have an n dot ton h um and also n n dot leaky relu so all these functions are available here and they are also available in the torch api like this so here we have torch dot relu then we have torch dot sigmoid we also have torch dot soft max and torch dot ton h and but sometimes they are not used in the the functions are not available in the torch api directly but they are available in torch.nn.functional so here i imported torch and and functional sf and then i can call here for example f dot relu so this is the same as torch dot relu but here for example is the torch is f dot leaky relu is only available in this api so yeah but that's how we can use the activation functions and pi torch and it's actually very easy and i hope you understood everything and now feel comfortable with activation functions if you like this please subscribe to the channel and see you next time bye hi everybody welcome to a new pytorch tutorial today we will implement our first multi-layer neural network that can do digit classification based on the famous amnest data set in this tutorial we put all the things from the last tutorials together so we use the data loader to load our data set we apply a transform to the data set then we will implement our neural net with input layer hidden layer and output layer and we will also apply actuation functions then we set up the loss and the optimizer and implement the training loop that can use batch training and finally we evaluate our model and calculate the accuracy and additionally we will make sure that our whole code can also run on the gpu if we have gpu support so let's start and first of all we import the things we need so we import torch then we import torch dot nn snn then we import torch vision for the data sets and we import torch vision dot transforms as transforms and we also import matplotlib dot pi plot splt to show you some data later and then first of all we do the device so device config and for this we create a device by saying device equals torch dot device and this has the name cuda if we have gpu support so if torch dot cuda dot is available and if it is not available so else we call our device simply cpu and then later we have to push our tensors to the device and this will guarantee that it will run on the gpu if this is supported so yeah so let's define some hyper parameters and here let's define the input size and this is 784 and this is because later we see that our images have the size 28 by 28 and then we will flatten this array to be a 1d tor tensor and 28 times 28 is 784 so that's why our input size has to be 784 then let's define a hidden size and here i will say this is 100 you can also try out different sizes here and the number of classes and this has to be 10 because we have 10 different classes so we have the digits from 0 to 9 then let's define the number of epochs and here i will simply say 2 so that the training doesn't take too long but you can set this to a higher value then we define the batch size here and this is let's say 100 and let's also define the learning rate here by saying learning rate equals 0.001 and now let's import the famous mnist data so you can have that from the pi torch library by saying training um data set equals and here we use torchvision.datasets.mnist and this will have to have the root where it has to be stored so root equals and here this should be in the same folder so dot and then it should create a folder called data and then we say train equals true so this is our training data set and then we say we apply a transform right away so we say transform equals transforms dot to tensor so we convert this to a tensor here and then we also say download equals true so it should be downloaded if it is not available already then let's copy this and do the same thing with our test data set and here we have to say train equals false and we also don't have to download this anymore so now let's continue and create the data loaders by saying train loader equals and here we get this from torch dot utils dot data dot data loader and then it will have to have the data set by saying data set equals and here it gets the training data set so train data set then we have to specify the batch size so this is equal to the batch size and then we also have to say or we can say shuffle equals true so this is pretty good for training and then we copy this again and do the same thing for our test loader so test loader equals gets the test data set and we can say shuffle equals false because it doesn't matter for the evaluation and now let's have a look at one batch of this data by saying examples equals and then we convert it to a iter object iter of the drain loader and then we can call the next method and unpack this into samples and into labels by saying this equals examples dot next and now let's print the size of these so let's print samples dot shape and also print print the labels dot shape and now let's save this and run this so let's call python feedforward.pi to see if this is working so far and yes here we have the size of the samples so this is 100 by 1 by 28 by 28 and this is because our batch size is 100 so we have 100 samples in our batch then the one is because we only have one channel so we don't have any colored channels here so only one channel and this is our actual image array so 28 by 28 as i said in the beginning and our labels is only a tensor of size 100 so for each class label we have one value here so yeah this is our some example data and now let's also plot this here to see how this is looking so for i in range six and here we use matplotlib so i call plt dot subplot of the with two rows and three columns and the index i plus one and then i can say plt dot m show and here i want to show the actual data so samples of i and then of 0 because we want to access the first channel and then i will also give this a color map so c map equals gray and then i say plt.show and let's save this and run this again and here we have a look at the data so these are some example handwritten digits and now we want to classify these digits so for this we want to set up a fully connected neural network with one hidden layer so let's do this so let's comment this out again and now let's create a class neural net and this has to be derived from n n nn.module and now we have to define the init and the forward method so the init method so this will get self and then it will has to have the input size then the hidden size and then the output size so the output size is the number of classes and here first we want to call the super in it so super of neural net and self and dot init self dot init and then we create our layers so first we want to have a linear layer by saying self dot l1 equals nn dot linear and this will have has the input size as input and the output size is the hidden size then after the first layer we want to apply a activation function and here i simply use the famous relu activation so self.relu equals nn.re lu and then at the end we have another linear layer so self.l2 equals nn dot linear and now we have to be careful so the input size here is the hidden size and the output size is the number of classes and now let's define the forward method so this will have self and one sample x and now we apply all these layers so we say out equals and now we use the first layer l1 which gets the sample x and then the next out is self.relu now use the actuation function which will get the previous output here and the last out equals self dot l2 and out so this will apply the second linear function and now we have to be careful again because here at the end we don't want an activation function so we don't apply the softmax here as usual in in multi-class classification problems because in a second we will see that we will use the cross entropy loss and this will apply the softmax for us so no softmax here so we simply say return out so this is our whole model and then we can create it here by saying model equals neural net and this will get the input size then the hidden size and the number of classes so yeah now we have the model so now let's come create the loss and the optimizer so here we say criterion equals nn dot cross entropy loss and this will apply the softmax for us so that's why we don't want this here so be very careful about this and now let's create our optimizer as well by saying tor optimizer equals torch dot optim dot um now let's use the atom optimizer here and this has to get the parameters and here we can use model dot parameters and it also has to get the learning rate lr equals learning rate um now we have the loss and the optimizer and now we can do our training loop so training loop now and for this let's first uh define the number of total steps so n total steps equals and this is the length of the training loader so now we can do the typical loop so we say for epoch in range num epochs and so this will loop over the epochs and now we loop over all the batches so here we say for i and then again we unpack this so we say images images and labels and then we iterate over enumerate over our train loader so the enumerate function will give us the actual index and then the data and the data here is the tuple of the images and the labels and now we have to reshape our images first because um if we have a look at the shape then we see that this is 100 by 1 by 28 by 28 as i showed you in the beginning and now we set our input size is 784 so our images tensor needs the size 100 by and 784 a second dimension so the number of patches first so let's reshape our our tensor first so we can do this by saying images equals images dot reshape and here we put in -1 as the first dimension so then tensor can find out this automatically for us um and here as second dimension we want to have 28 by 28 and then we also call to device so we will push this to the gpu if it is available and we have also have to push it to the push the labels to the device so labels equals labels to device and now let's do the forward pass so first we do the forward pass and afterwards the backward pass so the forward pass we simply say outputs equals model and this will get the images and then we calculate the loss by saying loss equals and then here we call our criterion and this will get the predicted outputs and the actual labels so this is the forward pass and then in the backward pass the first thing we want to do is call optimizer dot zero grad to empty the values in the gradient attribute and then we can do the next step by saying lost dot backward so this will do the back propagation and now we can call optimizer dot um step so this will do an update step and update the parameters for us and now let's also print some print the loss so let's say if i plus 1 modulo 100 equals equals zero so every 100th step we want to print some information so let's print the current epoch so by saying this is epoc epoc plus one and then we want to print all the epochs so number of epochs then let's also print the current step by saying step and this is i plus 1 and then the total number of steps by saying n total steps and we also want to print the loss by saying loss equals loss dot item and let's also say we only want to print four decimal values so yeah now we are done with the training so this is the whole training loop and now let's do the testing and the evaluation and for this we don't want to compute the gradients for all the steps we do so we want to wrap this in a with torch dot no grat statement and then first we say the number of correct predictions equals zero and the number of samples equals zero in the beginning and then we loop over all the batches in the test samples so we say for images and labels in and here we can simply say in test loader and then again we have to reshape this so like we did here so images and labels we want to reshape this and put it and push it to the device and then let's call or let's calculate the predictions by saying outputs equals model so this is our trained model now and this will get the test images here and then let's get the actual predictions by saying underscore and then predictions equals torch dot max of the outputs and along the dimension along the number one so the torch.max function will return the value and the index so we are interested in the actual index so this is the class label so that's why we don't need the first actual value so these are our predictions and now let's say the number of samples plus equals and here we say labels dot shape zero so this will give us the number of samples in the current batch so should be 100 and then we say the number of correct so the correct predictions equals and here we can say predictions equals equals the actual labels and then dot sum and then dot item so for each correct prediction we will add plus one and then of course we have to say plus equals the number of correct um values and then when we are done with the loop we calculate the total accuracy by saying ack equals 100 times the number of correct um predictions divided by the number of samples so this is the accuracy in percent and now let's print this so print um [Music] and we want to print accuracy equals and here we simply say ack and then we are done so now let's save this and clear this and let's run this and hope that everything is working so now our training starts and we should see the that the loss should be increased with every step sometimes it will also increase again but finally it should get lower and lower and now we should be done and testing is very fast so now we see that the accuracy is 94.9 so it worked our first feed forward model is done and yeah i hope you understood everything and you enjoyed this if you like it please subscribe to the channel and see you next time bye hi everybody welcome to a new pie torch tutorial today we are implementing a convolutional neural network and do image classification based on the sci-fi 10 data set the cipher 10 is a very popular image data set with 10 different classes like we have airplanes cars birds cats and other classes and this data set is available directly in pytorch so we will create a convolutional neural net that can classify these images so now let's talk about convolutional neural networks very briefly i will not go into too much detail now because this tutorial should be focused on the pytorch implementation but i will provide further links in the description if you want to learn more in detail so convolutional neural nets or conf nets are similar to ordinary neural networks they are made up of neurons that have learnable weights and biases and the main difference now is that convolutional nets mainly work on image data and apply the so-called convolutional filters so a typical confident architecture looks like this so we have our image and then we have different convolutional layers and optional actuation functions followed by so-called pooling layers and these layers are used to automatically learn some features from the images and then at the end we have a one or more fully connected layers for the actual classification task so yeah this is a typical architecture of a cnn and these convolutional filters now they work by applying a filter kernel to our image so we put the filter at the first position position in our image so this is the filter here and this is the input image so we put it at the first position the red position and then we compute the output value by multiplying and summing up all the values and then we write the value into the output image so here at the red position and then we slide our filter to the next position so the green position then if you can see this here and then we do the same thing and the same filter operation and then we slide our filter over the whole image until we are done so this is how convolutional filters work and now with this transform our resulting image may have a smaller size because our filter does not fit in the corners here except if we use a technique that is called padding but we will not cover this here in this lecture so getting the correct size is an important step that we will see later in practice and now let's also talk about pooling layers briefly so pooling layers are more specific in this case the max pooling max pooling is used to down sample an image by applying a maximum filter to sub regions so here we have a filter of size two by two and then we look at the two by two subregions in our original image and we write the maximum value of this region into the output image so max pooling is used to reduce the computational cost by reducing the size of the image so this reduces the number of parameters that our model has to learn and it also helps to avoid overfitting by providing an abstracted form of the input so yeah these are all the concepts we must know and again please check out the provided links if you want to learn more and now enough of the theory and let's get to the code so here i already uh wrote the most things that we need so we import the things that we need then we make sure that we also have the gpu support then we define the hyper parameters and if you don't know how i structure my pytorch files then please also watch the previous tutorials because there i already explained all of these steps so then first of all we load the data set and here as i said the sci-fi 10 data set is already available in pytorch so we can use it for from the pytorch.datasets module then we define our pytorch data sets and the pytorch data loaders so then we can do automatically batch optimization and batch training then i defined the classes and hard-coded them here and then here now we have to implement the convolutional net and then as always we typically we create our model then we create our loss and the optimizer so in this case as this is a multi-class classification problem we use the cross-entropy loss and then as optimizer we use the stochastic gradient descent which has to optimize the model parameters and it gets the defined learning rate and then we have the typical training loop which does the batch optimization so we loop over the number of epochs and then we loop over the training loader so we get all the different batches and then here again we have to push the images and the labels to the device to get the gpu support then we to do our typical forward pass and create the loss and then we do the backward pass where we must not forget to call uh to empty the gradients first here with the zero grad then we call the backward function and optimize a step and then print some information then when we are done we evaluate the model and as always we wrap this in a with torch torch.no gret argument or statement so because we don't need the the uh backward propagation here and the gradient calculations and then we calculate the accuracy so we calculate the accuracy of the total network and we calculate the accru accuracy for each single class so yeah so this is the script you can also find this on my github so please check that out there and now the only thing that is missing now is to implement the convolutional net so for this we define a class conf net which must inherit nn.module and as always we have to define or implement the init function and the forward function for the forward pass so now let's write some code here so for this we have a look at the architecture again so here first we have a convolutional layer and then followed by a reload activation function then we apply a max pooling then we have a second convolutional layer with a relu function and a max pooling and then we have three different fully connected layers and then at the very end we have the softmax and the cross entropy so the softmax is already included in the cross entropy loss here so we don't need to care about this so yeah so let's set up or create all these layers so let's say self.conf1 equals and here we get the first convolutional layer by we get this by saying nn.conf 2d and now we have to specify the sizes so the input channel size now is three because our images have three color channels so that's why the input channel size is three and then let's say the output channel size is six and the kernel size is five so five times five and now um let's define a pooling layer self pool equals nn dot max pool 2d with a kernel size of two and a stride of two so this is exactly in the as in the image that we have seen so our kernel size is size two by two and after each operation we shifted two uh pixels to the right so that's why the stride is two and then let's define the second convolutional layer so self conf two equals and now the input channel size must be equal to the last output channel size so here we say six and as output let's say 16 and kernel size is still five and so now we have our convolutional layers and now let's set up the fully connected layer by saying self.fc1 equals nnn dot linear and now here as an input size so first i will write this for you so this is 16 times 5 times 5 and as output size i will simply say um i will say 100 so you can try out a different one here and i will explain in a second why this is 16 times 5 times 5 then let's set up the next fully collected layer so this has 120 input features and let's say 84 output features and then let's use a next or final fully connected layer so we have fc1 fc2 and fc3 and this is an input size of 84 and the output size must be 10 because we have 10 different classes so you can change the 120 here and also the 84 but this must be fixed and also the 10 must be fixed so now let's have a look at why this is this must be this number so here i have a little script that does exactly the same thing so oh let me change the number of epoch oh yeah this is four so here um i have the same thing in the beginning i load the data sets and let's also uh um print or plot some images and then here i have the same layers so here i have the first convolutional layer and the pooling layer and the second convolutional layer and first of all let's run this and plot the images so let's say python cnn test dot pi and i've already downloaded it so it prints it also yeah it's very blurred but i think you can see this this is a horse and maybe a bird and another horse and yeah i i don't recognize this actually so let's run run this again see some better pictures maybe so yeah it's still very blurred so i think this is a deer a car a frog and a ship so yeah so um let's see um uh how the sizes look so first we just print images dot shape so this is four by three by 32 by 32 and this is because our batch size is 4 and then we have three different color channels and then our images have size 32 by 32 so now let's apply the first convolutional layer so we say x equals conf1 and this will get the images and now let's print the next size after this operation so let's don't oh sorry i don't want to uh plot this anymore so now we have the next size so this is 4 by 6 by 28 by 28 and so the six now we have six output channels as we specified here and then the image size is 28 by 28 because as i said the resulting image may be smaller because our filter doesn't fit in the corners here and the formula to calculate the output size is this so this is the input width minus the filter size plus two times padding so in this case we don't have padding and then divided by the stride and then plus one so in this example we have an input size five by five a filter size three by three padding is zero and stride is one so then we have the output size is five minus three plus one so this is two then divided by one is still two and then plus one so that's why here our output image is three by three and now we have to apply the same formula in our case so we have 32 minus the filter size so minus five so this is 27 plus zero still 27 divided by one still 27 and then plus one so that's why it's 28 so here we have 28 by 28 then let's apply the next layer so the next operation is the pooling layer so let's save this and run this so now our size is four by six by fourteen by fourteen so this is because as in the example our pooling layer with a kernel size two by two and a stride of two will reduce um the images by a factor of two so yeah and now let's apply the second convolutional layer so let's print the size after this operation so clear this first and run this and then again we would have to apply the formula as i just showed you to reduce the size so here pytorch can figure this out for us so the size is um 4 by 16 and this is because the next channel output size and that we specified is 16 and then the resulting image is 10 by 10 and then we apply another pooling operation that will again reduce the size by a factor of two so this is why now we see that the final size after both convolutional layers and the pooling layers is 4 by 16 by 5 by 5. so and now if we have a look again so now after these convolutional layers now when we put them into our classification layers we want to flatten the size so we want to flatten our 3d tensor to a 1d tensor and now this is why now if we have a look at the size now the input size of the first linear layer is exactly this that we have here so 16 times five times five so this is very important to get the correct size here but now we know why this is so this must be 16 times five times five and now we have the correct sizes so now we have all the layers defined and now we have to apply them in the forward pass so we say x equals and now let's apply the first convolutional layer which gets x and then after that we apply an actuation function so we can do this by calling f so i imported um torch and and functional sf and then i can call f dot relu and then put in this as the argument and then after the activation function so by the way the actuation function does not change the size so now we apply the first pooling layer so self.pool and wrap this here and so this is the first convolutional and pooling layer and then we do the same thing with the second convolutional layer and now we have to pass it to the first fully connected layer and for this we have to flatten it so we can do this by saying x equals x dot view and the first size we can simply say -1 so pytorch then can automatically define the correct size for us so this is the number of batches the number of samples we have in our batch here so four in this case and then here we must say 16 times five times five and now we have our tens of flattened and now let's call the first fully connected layer by saying x equals self dot fc1 and this will get x and then we apply an actuation function again we simply use the relu i also have a whole um tutorial about activation functions so please check that out if you haven't already so now after this we apply the second one so x equals this the second fully connected layer with a real actuation function and at the very end we simply have x equals self dot the last fully connected layer fc3 um with x and no activation function at the end and also no softmax activation function here because this is already included in our loss that we set up here so then we can simply return x and this is the whole convolutional net model now you should know how we can set up this and yeah so then we create our model here and then we continue with the training loop that i already showed you so now let's save this and let's run this so clear this and say python cnn dot pi and hope that this will start the training so um oh yeah one thing i forgot of course is to call the super in it so never forget to call super and this has to get the confnet and self and then um dot underscore in it so let's clear this again and try this one more time and now this should start the training so i don't have gpu support on my macbook so this can take a few minutes so i think i will skip this and continue when the training is done so see you in a second all right so now we are back our training has finished and if we have a look we can see that the loss slowly decreased and then we have the final evaluation so the accuracy of the total network is 46.6 percent and the accuracy of each class is listed here so it's not very good and this is because we only specified four epochs here so you might want to try out more epochs but yeah now you should know how a convolutional neural net can be implemented and i hope you enjoyed this tutorial if you enjoyed this please leave a like and subscribe to the channel and see you next time bye hi everybody welcome to a new pytorch tutorial in this tutorial we will talk about transfer learning and how it can be applied in pytorch transfer learning is a machine learning method where a model developed for a first task is then reused as the starting point for a model on a second task for example we can train a model to classify birds and cats and then use the same model modified only a little bit in the last layer and then use the new model to classify bees and dogs so it's a popular approach in deep learning that allows rapid generation of new models and this is super important because training of a completely new model can be very time consuming it can take multiple days or even weeks so if you use a pre-trained model then we typically exchange only the last layer and then do not need to train the whole model again however transfer learning can achieve pretty good performance results and that's why it's so popular nowadays so let's have a look at this picture here we have a typical cnn architecture that i already showed you in the last tutorial and this let's say this has been already trained on a lot of data and we have the optimized weights and now we only want to take the last fully connected layer so this one here and then modify it and train the last layer on our new data so then we have a new model that has been trained and tweaked in the last layer and yeah this is the concept of transfer learning and now let's have a look at a concrete example in pytorch so in this example we want we are using the pre-trained resnet 18 cnn this is a network that is trained on more than a million images from the imagenet database and this network is 18 layers deep and can classify images into 1000 object categories and now in our example we have only two classes so we only want to detect b's and ants and yeah so let's start so in this session i already i also want to show you two other new things so first the data sets image folder how we can use this and how we use a scheduler to change the learning rate and then of course how transfer learning is used so i already imported the things that we need and now we set up the data and the last time we used the built-in data sets from the torch vision data sets and now here we use the data sets dot image folder because we saved our data in a folder and this has to have the structure like this so we have the folder here and then we have a training and a validation folder so train and val and in each one we have folders for each class so here we have odds and ants and b's and also in the validation folder we have ants and b's and now in each folder we have the images here so for example here we have some ants and also let's have a look at some b's so here we have a b and yeah so you must structure your folder like this and then you can call the datasets.imagefolder and give it the path and we also give it some transforms here and then we get the classes the class names by calling image sets image data sets dot classes and um yeah then here i defined the training model where i did the loop um and did the training and the evaluation i will not go into detail here um you should already know this from the last tutorials how a typical training and evaluation loop looks like you can also check the whole code on github so i will provide the link in the description so have a look at this yourself and now let's use transfer learning so first of all we want to import the pre-trained model so let's set up this model so we can do this by saying model so model equals and this is available in the um torch vision dot models module so i imported torch vision models already and then i can call models.resnet16 or sorry resonant18 here and then i can say pre-trained equals true so this is already the optimized weights that are trained on the imagenet data and now what we want to do is we want to exchange the last uh fully connected layer so first of all let's get the number of input features from the last layer so let's say num features equals model and we can get this by calling dot fc fully connected and then the input features so this is the number of input features for the last layer that we need and then let's create a new layer and assign it to the last layer so let's say model dot fc equals and now we give it a new fully connected layer nnn dot linear and this gets the number of input features that we have and then as new output features number of outputs we have two because we have two classes now and now we send our model to the device if we have gpu support so we created our device in the beginning as always so this is cuda or simply cpu and now that we have our new model we can again as always define our loss and optimizer so we say criterion equals nn dot cross entropy loss and then let's say the optimizer equals this is from the optimization module optim dot sgd stochastic gradient descent which has to optimize the model parameters and we have to specify the learning rate equals let's say point zero zero one and now as a new thing let's use a scheduler this will update the learning rate so for this we can say we can create this by saying let's call this step lr scheduler equals and the lr scheduler is available also in the torch optimization module so we already imported this and then we can say lrs lr scheduler dot step lr and then here we have to give it the optimizer so here we say optimizer and then we say step size step size equals seven and then we say gamma equals uh let's say point 0.1 this means that every seven epochs our learning rate is multiplied by this value so every seven epochs our learning rate um has only 10 is now only updated to 10 percent so yeah this is how we use a scapula and then typically what we want to do is in our loop in our loop over the epoch so for epoch in range let's say 100 and then typically here we use the training where we also do um [Music] the the optimizer dot step optimizer dot step then we want to evaluate it evaluate it and then we also have to call scheduler step scapular step so this is how we use a scapula please have a look at the whole loop here yourself so yeah now we set up the scapular and let's uh call the training functions so here we say model equals and then train model so this is the function that i created and then i have to pass the model the criterion the optimizer the scheduler and also the number of epochs so num epochs let's say 20 and yeah so this is how we use how we can use transfer learning so in this case we use a technique that is called fine tuning because here we um [Music] train the whole model again but only a little bit so we fine tune all the weights based on the new data and with the new last layer so this is one option and the second one is um for this i copy and paste the same thing and [Music] let's see where does it start so here and then as a second option what we can do is we can freeze all the all the layers in the beginning and only train the very last layer so for this um we have to loop over all the parameters here after we got our model so we say for param in model dot parameters and then we can um set the require scrat attribute to false so we can say param dot requires scrat and then say re sorry dot requires scrat requires grant equals false now we have it and this will freeze all the layers in the beginning and now we set up the new last layer we create a new layer here and by default this has requires grad equals true and then again we set up the loss and optimizer and the scheduler in this case and then we do the training function again and so yeah so this is even more faster and let's run this and then have a look at both the evaluations and i also print out the time that it took so yeah let's save this and let's run this by saying python transfer dot pi and this might or first it will download all the images and this might take a couple of seconds because i don't have uh gpu support here on my macbook so i will skip this and then i will see you in a second right so now i'm back so this took super long on my computer so i reset the number of epochs to just two in this example so let's have a look at the results so after only two epochs um so this is the first um training where we did the fine tuning of the whole model so this took three and a half minutes and the best accuracy now is 0.92 so 92 percent and then this is the second training where we um only trained the last layer so this took only one and a half minutes approximately and the accuracy is also um is already over 80 so of course it's not as good as in when we train the whole training but still pretty good for only two epochs and now let's imagine if we set the number of epochs even higher so yeah this is why transfer learning is so powerful because we have a pre-trained um model and then we only fine-tune it a little bit and do a completely new task and then achieve pretty good results too so yeah so now i hope you understood how transfer learning can be applied in pytorch if you enjoyed this tutorial please subscribe to the channel and see you next time bye hey guys welcome to a new pie church tutorial in this video we will learn how to use the tensorboard to visualize and analyze our model and training pipeline tensorboard is a visualization toolkit in order to experiment with our models it is actually developed by the tensorflow guys but it can be used with pytorch as well so here on the official website we can do a few things we see a few things that we can do with tensorboard so for example we can track and visualize metrics such as the loss and the accuracy we can visualize our model graph we can view histograms we can project embeddings to a lower dimensional space and we can display images text and audio data and we can profile our programs and much more so now i want to show you how we can use this in our code so i'm gonna use the code from tutorial number 13 here all right so here is the code so this is the exact code from tutorial number 13 and if you haven't watched this one then i recommend that you watch this one first so i will briefly explain the code again now so in this tutorial we used the mnist data set so we did digit classification here so here we are loading the mnist data set then we are plotting some of the images and then we create a simple feed forward neural net so this is a fully connected neural network with one hidden layer so we see we have one linear layer first then we have a relu actuation function and then another linear layer and that's our whole forward pass then we set up our training pipeline so we have our loss and optimizer then we do the training so here as always we do a forward pass a backward pass and then update our weights and then at the end we evaluate our model and plot the accuracy so now let's use the tensorboard for this code to analyze our model a little bit more and the first first thing we want to do is to install tensorboard so for this we can do pip install tensorboard and this will install all the things that we need so in my case i've already installed this so this was fast and we don't have to install the whole tensorflow um library so tensorboard is enough here and now we can start the tensorboard by saying tensorboard and then we have to specify the path where we um save the log files and we do this by giving it the argument minus minus loctor equals and by default this is called in the runs directory so let's hit enter and then it will start up the tensorboard at localhost localhorse6006 and here we have a warning that it doesn't find tensorflow and it will run it with a reduced feature set but that is fine so let's open up the tensorboard and now here we have the tensorboard and right now we see that no dashboards are active and this is because we haven't written any data so let's do this so let's jump to the code again and now the first thing we want to do is to import the tensorboard so and for this we say from torch.utils.tensorboard we import and this is called summarywriter so we import a summary writer and so here i have a typo and now let's create a writer so let's say writer equals summary writer and then let's give it a directory where it should um save the log files and the default directory is as i said the runs folder but let's be more specific here so let's call this runs and then mnist and now we have our writer set up and now the first thing we want to do is here so here in the code we plotted some images and now instead of plotting let's add the images to our tensorboard and for this the only thing we have to do is we want to create a grid and then call the writer at image method so let's do this so let's say our image grid equals and we also get this from torch vision dot utils dot make make grid and then let's give it the data so here we have a one batch of our example data so let's put this in here and then let's call writer dot at image and then here we give the image grid and we also have to provide a label for this image in the beginning so let's call this for example m nist images and now um [Music] i want to exit here so i use i import sys so system and then here i use an early exit because i don't want to run the whole training pipeline right now so here i call sys dot exit and i want to make sure that all the events are written here so that's why i also call writer dot close so this makes sure that all the outputs are being flushed here and now let's save this and let's go to the terminal and let's run this so let's say python and then our file was feedforward.pi and hit enter and now let's go to our tensorboard again and let's reload this and then we see we have our images here and here we have our grid that we just created and this is 60 this is 8x8 because we specified our batch size to be 64. and yeah so now we can analyze our data and let's go ahead and do something more with our tensorboard so the next thing we want to do is to add a graph to to analyze our model so if we scroll down further then we see that here we create our um neural net so let's comment this sys exit out again and then here we create our model and then here our loss and optimizer and now down here let's add our model graph so we can do this by saying writer at graph and then here we give it the model and then we also can give it an input so we can say again we have our example data so this is one batch and then we have to reshape the same way that we are doing it here so let's reshape our batch data and then again let's call writer dot close and writer exit and system exits and again let's run our file and now let's head over to our tensorboard again and let's reload this and then we see here up here we also have the graphs tab so let's go to the graph and here we see our model so we have the input and then the neural net and now if we do a double click then we see more details so here we see our whole model and so now we see we have the first linear layer then we have the real actuation function and then we have the second linear layer and we also see the weights and the biases for each linear layer so yeah so now we can inspect this further if we want and yeah this is really helpful to analyze the structure of our model so yeah now we have our model and now let's analyze some metrics so what we did in the original script is we simply during the training we printed every 100 step we print the current loss so now instead of just printing did this let's add this to our tensorboard so let's add the training loss and also the accuracy for this and for this we want to have the mean loss during this batch training so let's add a two values up here before we start our loop so the first one is our running loss and this is zero in the beginning and then let's also say the running correct predictions equals zero in the beginning and now every hundredth step we um no sorry for in each iteration now we add the loss to the running loss so we say running loss plus equals loss dot item and we also add the number of correct predictions to the running correct so for this we want to get the predictions and we can do this the same way as we are doing it down here by calling torch.max so let's do this up here as well so we get the predicted values and then we say running correct plus equals and here we say predicted equals equals the actual labels and then the sum and this is a tensor with only one item so we can call dot item and now yeah here we add this to the running loss and now every hundredth step we want to calculate the mean value and add this to the tensorboard so we call writer dot at scala and now we have to add and have to give it a label so here let's give it the label training loss and now the actual loss is the running loss divided by 100 because we sum this up for 100 steps and then we also have to give it the current global step and this is the um by saying epoch and times the number of total steps that we extracted up here so this is the length of the trailing loader and then plus i and i is the current batch iteration so this is the current global step so here we add the training loss and now let's do the same thing again and add the accuracy so let's say accuracy and then here we have to say running correct divided by 100 and after that we have to set the running loss and the running predicted to the running correct to zero again so let's say running loss equals zero point zero and running loss and no sorry running correct equals zero again and then [Music] yeah now we have to save this and now we have to run the whole training pipeline so let's comment the system exit out again and now let's run our script and we should still see the the printing outputs here so for every 100th step we see that how the losses decreasing and now we should be done and now we also see the whole accuracy of our network and now let's go to our tensorboard again and again hit reloads and then we have one more entry up here and this is the scalars entry and here we have our two plots so yeah we see that it worked so we see the accuracy for each of these steps and we also see how the training loss is decreasing and yeah so here um by default uh tensor flow tensorboard is smoothing this line so we can modify the smoothing parameter here um [Music] but yeah and now we can analyze how the loss is decreasing and so for example if we see that at some point it is not decreasing further then we can see that at this point we have to um improve something so for example what we can do then is we can try out a different learning rate of course so this is usually one of the first things that we want to optimize so let's modify the learning rate and now let's um call the folder mnist let's say simply 2. and then again let's clear this and run our script again and then this should already update our tensorboard during the um during the file running so now we see a second graph and also here we see a second graph in the loss graph now let's reload this again and now yeah it should be done and now for example here we see then another graph with a different learning rate and this is how we can interactively optimize and analyze our model and now as a last thing what i want to show you is how we can add a precision recoil curve so precision recall curves lets you understand your model performance under different threshold settings and this makes more sense in a binary classification problem but if we analyze each class separately here then we do have a binary classification problem so let's add a precision recall curve for each class here and for those of you who do not know what a precision and recall mean then i have a link for you in the description so please check that out and now what we want to do here is so let's have a look at the official documentation here so also i recommend that you check out this link so let's search for at pr and then we see here we have the method at precision pr curve so this adds a precision recall curve and this needs the attack first and then it needs the labels and here we see the labels is the ground truth data so a binary label for each element and then it needs the predictions and the predictions are the probability that an element be classified as true and the value should be between zero and one so this is important here so we need to have the actual labels and also the predictions here all right so now let's go to the code and add a precision recall curve for each class so here in our evaluation we want to create a list where we store our labels so let's say labels equals an empty list and also a list for the predictions so pre threats equals an empty list and then during the batch evaluation so what we do here so for the labels we can say labels dot append the actual labels is the predicted labels and now for the predictions we have to be careful so for here we need um probabilities between zero and one and now here we get the outputs from our model and if we have a look at the neural net again and we see that we have a linear layer at the end so these are raw values and here we even have a comment so no actuation and no soft marks at the end because in this case this is applied in our loss function in the cross entropy loss but now again in the evaluation we want to have actual probabilities and if you've watched my tutorial about actuation functions then you know which actuation function we must use here to get the probabilities and this is the soft max function so this squeezes our values to be probabilities between zero and one so let's call the softmax here explicitly for our outputs and for this let's import f so functional so let's say here um let's import torch dot nn dot functional s f and capital f and then down here we want to calculate the soft max for each output in our outputs so let's use list comprehension for this and let's call this class um predictions equals and now here we use list comprehension and call f dot soft max and then here we say of the output and then we do this we have also have to give it the dimension so let's say dimension equals along dimension zero and then we want to do this for output in outputs and then let's add this to our what did we call it pretz so pretz dot append and then here class predictions and then when we are done with the for loop we want to calc uh convert this to a 10 so on so here we say labels equals and then torch dot cut the labels so right now this is a list and we want to concatenate all the elements in our list along one dimension into a one-dimensional tensor and for the predictions we want to have a two dimensional tensor so for each um for each class we want to stack the predictions and then we want to concatenate this so we say bratz equals and then here we say torch dot cat and then here we use list comprehension again and say torch dot stack and here we stack each batch and say for batch in our predictions so you should check the the shape of these tensors for yourself so this has shape i think how many we have i think 10 000 samples so this is 10 000 by 1 and this should be 10 000 by 10 so for each class we stacked it here and now when we are done so now the last thing we have to do is to have the actual pr curve so for this we say classes so our class labels in this case it's just the range 10 because we have the digits from zero to nine and now let's iterate over this so for i in classes and then we say we get the labels i equals so this is where labels equals equals i and then the same thing with the predictions i equals the predictions and here we want to have all the samples but only for the class i and then we call writer dot at and this is called at pr curve and this needs a tag so for the tag we just use the class label as string and then here we have the labels first and then the predictions so predictions i and then as global global step we just use zero and then let's call writer.close and then we are done so now let's save this and run our script one more time and now when this is done then we should see precision recall curves for each of the class labels so almost done and append so i have a typo here so i have two different labels variables so let's call this um let's just call this labels1 here so labels one and labels one and labels one and now let's run this one more time sorry about that so let's clear this and run this one more time and now again we have to go through the training pipeline oh i didn't save it alright so now we are done so let's reload our tensorboard one more time and now we have one more entry up here and this is the pr curve and now we should see the precision recall curves for each of our class labels so here we have label 0 label 1 and so on and then we can inspect the precision and the recall for the different thresholds so here on the y-axis we have the precision and on the x-axis we have the recall and then for example for each for different thresholds we can analyze it and see how many true positives how many false positives how many true negatives and false negatives we have so this is also really helpful to analyze the model and yeah so that's all i wanted to show you for the tensorboard i hope you enjoyed this tutorial and please consider subscribing to the channel and see you next time bye hey guys welcome to a new pytorch tutorial today i want to show you how we can save and load our model i will show you the different methods and safe options you have to know and also what you have to consider when you're using a gpu so let's start these are the only three different methods you have to remember so we have torch dot safe then torch dot load and model dot load state dict and these are all the methods we must remember and i will show you all of them in detail so torch dot safe here can use tensors models or any dictionary as parameter for saving so you should know here that we can save any dictionary with it and i will show you how we can use this later in our training pipeline so torch.save then makes use of python's pickle module to serialize the objects and saves them so the result is serialized and not human readable and now for saving our model we have two options the first one is the lazy method so we just call torch.save on our model and we also have to specify the path or the file name and then later when we want to load our model we just set up our model by saying model equals torch dot load and then the file name again and then we also want to set our model to evaluation method so by saying model.evo so this is the lazy option and the disadvantage of this approach is that the serialized serialized data is bound to the specific classes and the exact directory structure that is used when the model is saved so there is a second option which is the recommended way of saving our model if we just want to save our model our train model and use it later for interference then it is enough to only save the parameters and as you should remember we can save any dictionary with torch safe so we can save the parameters by calling torch.save and then model.statedict so this holds the parameters and then the path and then later when we want to load our model again first we have to create the model object and then we call model dot load state dect and then inside this we call torch dot load path so be careful here since load state dig doesn't take a only a path but instead it takes the loaded dictionary here and then again we set our model to evaluation mode so this is the preferred way that you should remember and now let's jump to the code to see the different saving ways in practice so here i have a little script where i defined a small model class and here i created our model and now let me show you the lazy method first so first we define our file name so we say file equals and let's call this model.pth so it's common practice to use the ending dot pth so short for pi torch and then we save the whole model by saying torch.save and then model and the file so let's save this and let's run the script so let's say python save load dot pi and now if we open up our browser so we can ignore this warning here then we see here we have the model.pth um file in the explorer and if we open this then we see that this is some serialized data so this is not human readable and now if we go back to our code so let's load our model so we can comment this out and we also can comment this out and then we can load our model by saying model equals equals torch dot load and then the file and then remember we want to set it to evaluation method so we say model dot evil and then we can use our model for example we can inspect the parameters so let's say for param in model dot parameters and then let's print our param and save this and let's clear this and run our script again and let me make this larger for you so now if we run our script again then we can see that we loaded our model and we can use the parameters so this is the lazy option and now let me show you the preferred way of doing this so instead of just saying torch dot save model here and what we instead want to do is to say torch dot safe and then we want to save the state stick so here we have our model again and then we say torch dot model dot torch dot safe model dot state state dict and then let's run this so let me clear this and let's open up the explorer and delete this file here and now if we run our script again then we again have the file but now here it only saved the state dict and now if we want to load our model again we first um have to define it so let's call this um loaded model loaded model equals and then let's also say the model and the number of input features is the same so it's six and then we call loaded model dot load state dict and inside this remember we have to call torch dot load and then the file name and then again we set our loaded model to evaluation mode and then if we run this so let's print the params of the loaded model and up here let's also print the params of our normal model so this if we don't do any training here then our model is still initialized with some random parameters so let's run the script and let's check if the parameters are the same so here yeah we see that it worked and it first printed the parameters of the model of the normal model and then here of the loaded models so these are the same so we see we have a tensor with the weights and we also have a tensor with the bias and this is the same for both of our models so we see that this worked too so yeah again so this is the recommended way of doing it by saying save dot model state dict and then when we load it we call the load state dict um method and yeah so here we just saved the state dict so this holds the parameters so let me show you how this state dict looks like so when we have our model let's print model dot state dict and let's save this and let's clear this and run the script then we see here we have our state dict so here we have the linear weight which has the tensor with the weights and then we also have the bias tensor so this is our state dict and now let me show you a common way of saving a whole checkpoint during training so as you know we can save any dictionary here so let's say we also have a optimizer here so let's say we defined a learning rate so let's say this is point zero zero one and we also have a optimizer this is let's say torch dot optimum dot let's use stochastic gradient descent and here we want to optimize the model parameters and we also have to give it the learning rate by saying learning rate equals the learning rate and our optimizer also has a state dict so we can also print the optimizer state deck now if we clear this and run this then we see the state dictionary of the optimizer where we can see for example the learning rate and the momentum so now during training let's say we want to stop somewhere at some point during training and save a checkpoint then we can do it like this so we create our checkpoint and this must be a dictionary so let's create a dictionary and as a first thing what we want to save is for example the epoch so let's define the epoch so the key is called epoch and let's say we are just we are in epoch 90 and then we want to save the model state so we have to give it a key let's say model state and here we use model dot state dict and then we also want to save the optimizer state dict so the key let's say optim state and then here as a value we have to call optimizer state dict so this is our checkpoint and now we can call torch dot save and then save the whole checkpoint so let's say torch checkpoint as as a file name let's call this check point dot p t h and now again let me show you the explorer and let's run this script so now we clear this and run this then we see we have our checkpoint here and now when we load this we want to load the whole checkpoint so we can comment this out and also we don't need this so um let's say our loaded checkpoint so let's say loaded checkpoint equals torch dot load and then the file name was um the same as this one and now we have to um set up the different model and optimizes again so we can get the epoch right away by saying epoch equals load at checkpoint so this is a dictionary so we can just um call the or access the epoch key and then for the model remember we have to create our model here again so let's say model equals and then the model with the number of input features equals six and the optimizer equals the same as this one so we don't have to use the same learning rate actually so let me just grab this one here and paste it down here and for example we can use the learning rate 0 and then later we can see that we load the correct learning rate into the optimizer so now let's say model dot load state dicts and then here we give it the checkpoint and then we access the key we call it model state so this will load all the parameters into our model and the same with our optimizer so we call optimizer dot load state sticks and then we use the check point and here we called it op tim state so now we have the loaded model and the optimizer and also the current epoch so we can continue our training and let me show you that this is all correct by saying we want to print the optimizer dot um state dict so if you notice here we set the learning rate to zero and then we loaded the correct state dict so now if we run this and as a last thing we printed the um optimizer static then we see we have the same learning rate as in the initial optimizer so this worked too so this is how we can save and load a whole checkpoint and yeah these are all the three um ways of saving you have to know and now as a last thing i want to show you what you have to consider when you are using a gpu during training so if you are doing training and loading both on the cpu then you don't have to make any difference so you can just use it like i did here but now if you save your model on the gpu and then later you want to load it on the cpu then you have to do it this way so let's say somewhere during your training you set up your cuda device and you send your model to the device and then you save it by using the state dict and then you want to load it to the cpu so you have your cpu device then you create your model again and then you call load state dict and then load path and here you have to specify the map location and here you give it the cpu device so this is if you want to save on the gpu and load on the cpu now if you want to do both on the gpu so you send your model to the cuda device and saved it and then you also want to load it on the gpu then you just do it like this so you set up your model you load the state dict and then you send your model to the cuda device and now as a third option so let's say you saved your model on the cpu so you didn't um send you didn't call model to cuda device somewhere but then later during loading you want to load it to the gpu you have to do it like this so first you specify your cuda device then you create your model and then you call model dot load state dict and then torch load with the path and then as map location you specify cuda and then colon and then any gpu device number you want so for example cuda colon zero and then also you have to call model to device so to the cuda device so this will send the model to the device to the device and also all the loaded um parameter tensors to the device so then you can continue with your training or interference on the gpu and of course you also have to send all the training samples to the device that you then use for the forward pass so yeah this is what you have to consider when you're using a gpu and now you know all the different ways of saving and loading your model and yeah that's all for now i hope you enjoyed this tutorial and if you like this then please consider subscribing to the channel and see you next time bye
Info
Channel: Python Engineer
Views: 89,619
Rating: 4.9676352 out of 5
Keywords: Python, Machine Learning, ML, PyTorch, Deep Learning, DL, Python DL Tutorial, PyTorch Tutorial, Tensorboard, PyTorch Course, Neural Network, Course, CNN, Linear Regression, Logistic Regression, DataLoader, Optimizer, Loss, Crossentropy, Activation Function, Softmax, Transfer Learning, Model, Training
Id: c36lUUr864M
Channel Id: undefined
Length: 275min 42sec (16542 seconds)
Published: Wed Feb 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.