Real-World PyTorch: From Zero to Hero in Deep Learning & LLMs | Tensors, Operations, Model Training

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone my name is Vin and in this video we're going to learn the foundations of pytorch a library that is used throughout the industry for building deep mural networks and even L language models in production we're going to start with what are tensors what can you do with those tensors and then we're going to continue with how you can use real world data in order to convert it into a format that is usable within py torch then we're going to get the data and train a very simple deep neural network with it and I'm going to show you how you can build your own model from scratch and then how you can train your own model with pytorch tools finally we're going to evaluate the model and I'm going to show you how you can actually create some pretty nice charts along the way let's get started if you want to follow along there is a complete talk tutorial that is available for MPR Pro subscribers and you can find the tutorial within the bootcamp sections and then real world pytorch in here you can find all of the explanations that I'm giving throughout this video as well as all the code that you need in order to follow along as well here I'm also going to give you some practical tips and tricks so please consider subscribing to M expert Pro and enjoy the whole boot camp if you're a subscriber thank you I have a Google cop notebook that is already running and this one is using a T4 GPU which is on the free tier on Google coab and the first thing that I'm doing here is to install torch and this is uh the latest version of as of now 2.21 so this is going to essentially install this on the Google quap notebook. I'm running and if you don't want to run on Google quap notebook you can use your favorite part python package manager for example I'm using Pym and within it you can essentially do pip install torch and this should give you the latest version of torch of course if you want to have um GPU support you need some sort of Cuda enabled GPU so if you're having an Nvidia GPU this should be supported at least if it is like for example two three or four years old up to those years and if you have installed Cuda uh appropriately so if you have everything like that uh if you go ahead and import torch and look at the version of the library you'll see something like this and this will tell you that you're actually getting the version of torch but you're also getting this Coda support and if you want to have a look at why this is relevant uh you'll see here that this is actually supporting the current version which is 12.2 and here we have torch installed with uh Coda one 121 so this should be supported and in order to check that I'm going to essentially do the following so I'm going to ask torch to see whether or not Cuda is available and as you can see this is returning through so this should uh give us a nice uh overview or uh background on whether or not the GPU is actually running within the torch environment right here so the first thing that I'm going to show you is probably the most important concept of the pytorch library and those are the tensors the tensors are actually something that I like to think of as data containers and they can be and dimensional so for example if you have just a number for example 42 you can create a tensor of that and this tensor is going to be representative of a scaler and this scaler is actually going to be represented via a tensor within the P torch library and unfortunately py torch doesn't support working with for example non Pi vectors or just regular python vectors so in order to work with the py torch Library you actually need to create those tensors and this is why you need to learn how to create them or how to convert other data formats into tensors something that I'm going to show you how you can do right now so let's start with the scaler that I'm going to show you and the easiest way to create a tensor is to call torch. tensor and you can just pass in the tensor or the scaler that you want and I'm going to create a variable for that and you can see that the output is actually a tensor of 42 so this is essentially how you can create tensors so the next thing that I'm going to show you is how you can get the internal value of the tensor and I'm going to called just item and this will essentially return an integer that is the wrapped value of the tensor that we have created here this is very important if you want to get the value of the tensor and convert it into something that python or numai can actually understand so the next thing is to actually check the type of the tensor you might know that python is actually let's say very wly typed and even though if you're passing in an integer or something like that you don't really care about the type of this number or string or whatever but in torch you actually need to care about the types and you have some ways to convert from different types to other types for example for from fors to integers or from fors to Strings or Etc so another important thing that I've forgot to mention is that actually tensors don't really work with strings and let me show you what I mean by that you can't actually create this tensor with a string so in general you need to think of tensors as containers of numbers uh and those numbers can be of different types but pretty much strings are not allowed here uh yeah and pretty much every time when you do something with uh neuron Nets or other models you don't really or can't really work with strings you need some way to convert those strings into numbers so keep that in mind so to check the data type of our scaler I'm going to call just a property called D type and this will return to us torch. in 64 so one important thing about this is that the default representation of this integer is int 64 this is chosen when the torch tensor is created automatically for you and another important thing is that this is not just an INT or any other type this is type that is specific to torch and it's called torch. int64 and if I call this I'm going to show you actually this is the self- representation of this type that you have so how can you use this in order to convert for example this integer into a FL number so one thing that you can do is to create uh or call the method two and for the scaler I'm going to use flat 32 and I'm going to create a variable for that I'm going to print out the value and then I'm going to print out the type of this scaler new scaler as you can see the type now is FL 32 and this is the presentation of the item uh you can see that this is now a floating Point number and if I do the or call the method the method item on the FL tensor you see that we have 42 but this is now a floating Point number something that is very important if you are for example getting some errors that are saying well we expected here to have an integer but you're giving us a FL or vice vers uh this is pretty much a sure fire way to do something along these lines and then py will probably work fine after that this is very common where when you're passing for example some features or some uh labels during your trainings or maybe some was functions Etc and then uh doing some calculations on those uh this is pretty common to get an error that the types are mismatched and then you probably need to do something along this these lines in order to convert into the correct types yeah so next I'm going to show you how you can create a vector and this will be very easy to do again I'm going to call torch. tensor and I'm going to pass in two numbers in here and I'm going to pass in those numbers into a python array and let's have a look at what we get from that you can see that this is now represented as an wrapped array or list in Python within the tensor again and if I call the dot shape property of this one you see that we have zero for First Dimension or just a single Dimension and within that dimension of the tensor we only have two elements so this is the shape of the tensor this is very similar to what you have in numai so if you're familiar with that this should be pretty much exactly the same Nique except that instead of a tole you're receiving this torch do size uh structure if you will which is going to be representing the dimensionality of each axis for your tensor since tensors can be n dimensional uh you can pretty much understand that this is going to be representing each Dimension size another thing that you can do right here is to call vector. D type let's see what we have and this is essentially the type of each element right here let's try something what will happen if I'm essentially creating this will this work yeah it works but as you can see pytorch is automatically converting the integer value into a floting point number so essentially what you can't do is to create a tensor with different types within so let's say that I have uh this again this will convert the 42 into a floting point number so essentially torch is doing some smart stuff behind the scenes but you need to have the same types within the tensors okay so another thing that I want to show you is how you can create for example a matrix and in this Matrix I'm going to essentially create this Vector here I'm going to take that for the first rle and then I'm going to essentially reverse that into a second R of the Matrix yeah so now the current Matrix have uh two rows with two columns this is a square Matrix and if you call the matrix. shape you see that again we have the two dimensions for the rows and The Columns of this Matrix if you will and then in each Dimension we have um two elements and if I try to let's say put in three right here we'll get an error since the first array or list is not exactly the same as the second one you can essentially do that with the Matrix and pytorch will not allow it as well tensors are great stores of values but uh in order to do something useful with those you need to learn how to create operations on them or how to use those tensors in order to produce something that is usable so in order to do that uh pytorch allows you to use a lot of operations on those or methods on those so let's I'm going to start with some helpers that I've used pretty much uh every time that I do some let's say much more fundamental research and in order to do that I'm going to show you a couple of helper or very useful functions that I use so first one is torch. zeros and uh in here you can uh create a tle and this will essentially create a matrix or other type of tensor depending of the dimensions that you're given in with uh zeros so for example if you want to initialize some dummy variables or say for example weights or something like that you can use this in order to get a short hand of how to create a tensor with zeros again uh you can have torch dot once to create the same thing but uh with uh digits of one and another thing that I like to use is torch. around and here for the size I'm going to pass in something a bit different and this is the tensor that you're going to get those elements are going to be small numbers between zero and one and for each one you're going to get essentially a random value so this works as expected as well and again one important or another important thing that you can do is to create this re shaping so for example if you have uh let's say this Matrix with two rows and three columns and you want to convert that into for example two uh three rows and two columns you can use something like that I'm going to create a tensor here with 1 2 3 three and then four 5 6 and if I want to reshape this with uh three rows and two columns you'll see that the new tensor is actually reshaped correctly when I call the reshaped method on the tensor that we've created so another important operation that I do is to essentially remove some d Dimensions if you will so for example let's say that I have the tensor 142 and then 42 1 and if you have look at the shape of this tensor you see that is 2 and2 and if I want to add another dimension on this I'm going to call Un squeeze uh because mostly when you're doing operations on tensors you can't really do those operations if the dimensions don't match so for example if you're multiplying matrices or vectors you essentially need to have the correct dimensions in order to do the operations so UNS squeeze essentially adding another dimension on top of the tensor and if I do this on Dimension zero and have a look at the shape uh sorry I probably have a typo here yeah and squeeze uh you can see that we've added another dimension and if I remove the shape from here you're going to see that essentially what we did here is to add these new brackets which allow us to give us this new dimension so I'm going to leave this again this is a pretty useful when you're working with different types of tensors so let's say I have a tensor that is created like this with two rows torch tensor and in it I have two rows of a matrix and let's say that I have the numbers. 3.5 2 and in the second one [Music] .15 and then 85 so essentially you get this and both of those rows should sum to one so this is our example and if I have to get the maximum value from all of these t. Max will return the maximum value from all of the tensor values and if I want to return the maximum values for each row I can do T do Max and for the dimension I want to specify the First Dimension so if I show you right here the dimensions so this is going to be the zero dimension for the two rows and then for the all of the elements it's going to be the dimension one so if I do this this is going to return 0.5 and 0.85 and if I do the First Dimension uh this is going to be essentially returning each of the column maximum values something that we don't really want in our case so let me just do this and note that the maximum value is going to be returning just the values and then the indices of those values so uh this is really helpful if you want to get the elements thems or the positions of the elements themsel most of the time you are not going to be creating tensors from uh scratch and you will probably use something like nire Ray pandas or just python lists so in order to do that uh torch is providing you a lot of helpers in order to work with those and let's say that you can do something like this or you have something like this for example if you have an NP array and I'm going to add NP in here and let's say that you have this nump array one and 42 so this is the array that you get and then in order to convert this into a tensor you need to essentially call from numpy pass in the tensor or the nump array and again if you just call the a you'll see the array NP array and if you call T uh you will now have a numai a tensor from numai so this is uh really useful when you do such operations and your uh arrays are probably in nump of course you can pretty much do the same thing for pandas but it is a bit more involved since um let's say that torch doesn't have a direct uh integration with pandas while the integration with naai is pretty good and let's say that we have this data frame I'm going to create a data frame with a single colum 142 for numbers and let's have a look at what we have here yeah you'll see that we have this very simple uh data frame and in order to create our tensor from that I'm going to create torch. tensor I'm call going to call the numbers and then I'm going to call numai and this is essentially the second way to create a tenser out of numpy array so you see here that I'm actually converting the yeah these are the numbers oh that's great actually we are getting a preview of the plot of the numbers great so uh this is as one way to convert the array into numpy arrays from this this series the number series of the data frame and then I'm going to pass in the to that to Tor tensor and let's see if we call this from npy essentially get the same result so this is a way to create tensors from data frame one of the most important advantages of using pytorch is to actually run your operat on a GPU and this is significantly speeding up your computational performance of course depending on the GPU that you have and in here again we're using a T4 GPU so let me show you how you can essentially put some tensors on the GPU and run some operations so the first thing that I've already shown you is to check whether or not this is available on your P torch installation and this is the Cuda enabled acceleration and you can see that this is actually true in our case we have the correct pytorch version with the correct Coda installation and then we have a GPU that is also available for us next I like to do something like Nvidia SMI and then I going to query the GPU memory used and I want this into a CSV format and as you can see right now we have just a 3 megabytes of memory used on the GPU and let me get the device that is going to be pointing to the GPU device from the torch so this will essentially give us a pointer to where we want to store our uh tensors and where the calculations are going to be carried on so I'm going to use a coda device if essentially we have the coda implementation else I'm going to use a torch device CPU and this simple if is going to be run in order to get the device that we want and now uh up until now at least if we call T device on our previous tensor uh you'll see that we get this type of CPU right here and if I do something along the lines of this create a GPU tensor variable and then I'm going to use the same thing that we've used before torch. tensor and I'm going to pass in uh an array and then I'm going to say device and I'm going to pass in the device and if I have a look at the tensor now you'll see that uh before we didn't have this device part while now we have the device part on Cuda zero so this is going to be essentially storing our tensor on the first available GPU with index zero so uh you might guess that if you have multiple gpus you can essentially pick and choose where your tensors are going to be stored or on which one of the many that you have and in our case we have just one so this is going to be essentially pointing on this device and again if I call GPU tenser device you can do that and let's me experiment with Coda zero right here and now you are essentially getting the complete index on the Cod device so if there is the case that you have multiple gpus this tensor is going to be specifically put in the zeroed device so again I'm going to query the GP memory usage now that we have a single tensor on the GPU and you can see that not not the the single tensor but since we're using the P torch on the GPU right now uh you can see that we've bumped this to 100 megabytes more of course uh this tensor alone does not do the damage if you will but all of the necessary requirements or implementations on P torch are going to be put on the GPU and this is going to increase the memory usage that you have so this is perfectly normal so let's say that you have a CPU tensor or a tensor that we've already created so I'm going to get this this is our CPU answer pretty standard something that we've already worked on and uh again this is not stored on the device that is going to be the Cod device right here so in order to put this tensor on a GPU uh we are going to be using the same two method that we've uh that have shown you previously but instead of the data type here I'm going to pass in to device and let's have a look you'll see that the device now is C zero so this is going to essentially be putting the tensor on the GPU so this works as expected all right so let me show you something a bit um more interesting what happens when you do something like this CPO tensor times GPU tensor so these tensors should be able to be calculated or the response for those but this is the error that you're getting essentially what this says is that you need both tensors to be on the same device but in our case we have a CPU tensor which is of course on the CPU side and then we have a GPU tensor so what you can do in order to complete this let's say that you want to put in the CPU tensor to the device which is going to be the Cod device and then I'm going to multiply this by the GPU tensor and now you have the calculation result uh that is pretty much giving you uh this tensor right here and as you can see the resulting tensor is also on the device that you wanted or that you set which in our case is the coola device and if I do something like the following if I get the CPU tensor get the GPU tensor to be on the CPU you now see we have the same result but this time the tensor is not on the resulting tensor is not on the GPU so you do need to have in mind that you need to essentially manage this process if you're only going to be using by torch for the next part I'm going to include some imports right here uh these are pretty much are going to be helping us with various tools for example the order dick some typing implementation M PL lip and then for the torch I'm going to import NN the Adam Optimizer which we're going to use in a bit a data water and a data set and then TDM in order to follow the progress of our whoops another important thing that I'm going to show you here is how to set the seats for your random number gener and this is for the random library in Python then for numpy and finally I'm going to use the menual seat function or method from touch and here I'm going to set the seat to 42 for all of these and once those Imports are complete I'm going to go down to the notebook and I'm going to download a CV file that I'm going to show you the contents of which right now so let me start by calling this for example uh data rang link so when you deal with real world data for example you might have something like this CSV file or uh let's say something from uh database from SQL database or other type of data for example it could be a API response or something like that uh the first thing that you need to do is to get this data to be into something that you can understand from uh P torch so this is the data that I have this is from Keo and it represents a lot of data [Music] on calary expenditure based on some activity for different users uh and this is the user ID and then you have all of those metrics for these particular minutes sary minutes active minutes fairly active very active Etc and then how much coveries were expanded based on those activities uh if you want to have a deeper look at the data again go to mxpr Pro uh to the mxpr pro tutorial in which I am explaining the data in a bit more depth but again this is from Keo and uh in order to get the these I'm going to uh these header names I'm going to be using this pattern in order to put them into something more nicely along the python lines so how can you use this data frame and convert it into a pytorch data so for that I'm going to be showing you how to convert this into pytorch data sets and first I'm going to take the uni users to do that I'm going to get DF do id. unique and this is going to return a list of all unique user IDs uh yeah I have a typo here okay now that we have the user IDs I want to get different users within a train test and validation splits so in order to do that I'm going to be using the train IDs temp IDs using the train test split function from SK War and the test size for that is going to be 0.2 so 80% of the data or the users are going to be reserved for training of course this is not a lot of data but yeah we have some examples here and another thing is to get the test IDs and validation IDs and passing the temp IDs and for this I've test size 0.5 so essentially what those are going to be are going to be the train test and validation IDs of the users and each user is going to be in uh different branch or a different subset of the data so to get the actual data for those I'm going to filter by the IDS and if I do is in this should give us let me just check here yeah this should work train test and then V for the validation all right so it looks all right uh let me check the shapes of each one all right so we have 15 columns for each uh 730 examples for the training data 87 for testing and then 123 for validation uh it appears to be the case that even though we have um the same number of users for um test and validation uh different types of users have a lot more data compared to others so this is something to be expected and um now I'm going to continue with how you can get essentially this data frame and convert it into a something that pytorch can understand so in order to do that I'm going to extend the data set class from the pytorch library and I'm going to create a class called calories data set that is going to extend the data set class and essentially what you need to do here is to have a get get item method which is going to take an index and then have um Len method that is not going to take anything except for self and these are the required methods that you need to overwrite in order to build this data set so uh I'm also going to add a Constructor here and this is going to take a data frame so in order to have a look at what we need I'm going to go back to the training DF and have a look at the data that we have so in here I'm going to be using just two uh columns for the features total distance and then uh very active minutes yeah I believe so and then I want to get the coveries maybe you can do some combination of all of the minutes or something like that because very active minutes can be a very low number but yeah you can use these in order to uh create the cies or have a look at how the cies are essentially expanded all right so in order to do all of that I'm going to create uh features property of the data set and I'm going to take in the total distance and then the very active minutes and then for the labels I'm going to take the calories so for the length or the number of elements within the DAT set I'm going to essentially return the L of the labels and then for the item I need to be converting these into tensors or integers so for the features I'm going to to create a FL tensor and in this FL tensor I'm going to be creating uh something along the lines of getting the features at the index position and then I'm going to convert this into unpi array this is going to be the 14 point tensor that we're creating and then for the label I want this to be from the labels and I want this to get the label on index index that is given to us and finally I'm going to return the item which is going to be the features and then the label let's run this all right so let's have a look at what we have if I create an instance of this data set cies data set and I pass in the train data frame and I get the first element of this data set uh I have a type of for tensor all right and you can see that we get this tle which contains the features floting Point tensor with a vector in it and then we have an integer that is this label right here so uh essentially what you get is the features and the label for uh this particular item so this appears to be working quite fine uh then I'm going to create test dat set and then validation uh let's actually reverse those all right so we have the three different dat sets and now I'm going to be wrapping those within data Waters the important thing that data worlders are doing is to actually create batches in order to train your models much more efficiently so instead of getting for example just a single example per iteration when you're uh training your models you can specify a b size which is essentially the number of examples from the DAT set that you can use during training to a signific can speed up your training of course and probably uh this is going to be giving you also better models overall uh but uh if you want to choose this batch size you can essentially need to wrap your data sets into data wers and this is pretty easy with pytorch so train data water I'm going to create an instance of the data water and in here I'm going to specify the data set in our case train data set I'm going to specify the B size equal to eight and I want this to be shuffling the data and then number of workers which is essentially the number of threats that we are going to be using in order to W the data and final parameter is drop was equal true uh which is going to essentially give us a data water that has B size of each uh each B size have exactly eight examples and if you don't want that you can remove this shuffling this will essentially give us the shuffle results from the uh data set uh this of course is going to be break and independencies based on the user data and then uh for the validation water I'm going to be essentially doing the same thing let me just copy and paste that with of course some differences uh we are not going to be using the TR set here and I don't want this to be shuffled all right and then uh exactly the same thing for the test water and we don't want any shuffling within the validation and test Set uh since we don't really uh care you you can actually Shuffle it but it's not needed to because uh the data there shouldn't provide you with uh feedback in order to evaluate your models further all right so uh let's run through this uh sorry you don't really care because you don't use the errors or anything to improve your models of course you can overfit on your own if you're training uh the model uh multiple times with some of the data so yeah you you might try to shuffle it but essentially this is what I do so uh how do you get a batch of this water uh I'm going to get the features and the labels and I'm going to enumerate on the training qu and I'm going to print the features print the labels and I'm going to break because I just want this to have a single example and you'll see that this is actually returning the tensor with the features this is are the first eight examples with the act very active minutes and uh very total distance and very active minutes uh you can see that we have a lot of zeros right here so again you might want to try to do something a bit different maybe select other features do some feature engineering uh this is not the point of this video of course and these are essentially the labels and one important thing right here is to see is that the actual output of the labels is now a tensor so uh there is this batching function that is uh essentially doing the wers job in order to uh give us tensors for each and as you can see your integers from the labels are actually converted into tensor that is a b size with the B size of eight with elements that are again integers but those are wrapped within the tens now that we have the data converted into data sets and data wers we can continue with building a model and in order to do that I'm going to be using the sequential method of building pytorch models and this one is very nice compared to other approaches such as extending an NN module or creating a modu J dictor list and I like this approach when I'm building somewhat simpler models such as the one that we're going to create right here and in order to create the model I'm going to be using NN do sequential and in here I'm going to pass in an ordered dict which is going to take uh array and within that I'm going to pass in some Tes for the first layer I'm going to call it a hidden layer one and this is going to take a linear layer so if you don't know what a linear layer is it is pretty much a fully connected neural network layer and this one pretty much requires to have a number of features for the inputs in our case we have only two features I'm going to pass in two and then as an output features I'm going to specify 64 this is a place where you decide how much elements or weights or parameters you want your neural network to have and this is some sort of experimentation with trial and error in order to get better results of course uh with some experience you can choose those as well uh find those more easily uh is what I mean but I like them to be some sort of power of tool of course this is not strictly uh required but this is uh how I found those to work really well and then I'm going to pass in an activation function in our case I'm not going to be using anything crazy just a r function which is going to be actually breaking the linearity of our neural network and this will uh hopefully increase the performance of your our model so next uh I'm going to pass in a hidden layer next hidden layer and and this case I'm going to pass in the number of input features which is going to be 64 this should be the same number that we had before and then I'm going to reduce the number of parameters to 32 I'm going to get this and do activation two again a r function and then uh I'm going to create another layer called output layer and this is going to be taking this 32 and uh it is going to Output a single number which is going to be the number of calories that we are predicting and this is pretty much the model that we have and if I have a look at what we got this is essentially the model representation one great thing of the way that we've initialized the model is uh that we pretty much named each layer and of course this is really useful when you are having a look at the model itself uh this is pretty nice representation of what we had right here and it is pretty readable in my opinion so this is why I really love this weight of building model of course uh this is not so uh let's say extendable such as when you're extending from an module or something like that but uh in simpler cases and this is pretty much amazing since I love how easy it is to read uh these types of codes uh and then the output is really nice to understand and easy to understand as well so let me show you how you can actually query individual layers here layer one for example and then I can get the weights and for that I can also call the shape so you see right here that the shape is actually corresponding to the input features two write this for the input features and then the output features is going to be 64 uh this shape right here and if I go to the next layer you might expect uh 64 and 32 exactly the same so this is essentially a representation of the weights that you have or the parameters that were initialized and if you take a look at some of the weights you'll see that uh for example let's go to the first layer right now and take just the first 10 examples uh you'll see that these are the initial weights that we have from our model since those were created by uh these Constructors right here and they were initialized internally to some very small random numbers as you can see nothing is uh quite LGE or nothing is quite uh negative so you see those numbers are initialized with some default initializer and then I'm going to take a bit of code in order to show you what we can also do with our models so this is going to be essentially creating a data frame for us with the layers and I'm skipping the activation layers right here maybe you can also include those but yeah these are the representations uh you see that we have weight parameters of 128 so this is uh exactly two * 64 and the number of bias parameters is actually the number of output features you can see for the next layer we have 64 * 32 and then 32 as bias parameters exactly the same thing for the output layer so this is again a nice way to represent the architecture of your model and I really love to do that uh this gives you a way to summarize what your model architecture is for example if you have something like very much larger like bird or tiny one or something like that you can do such visualizations if you will and understand a bit better what is behind this model that you're uh having of course uh when you go to something even like tiny one or something like that it is very very hard to understand the complete architecture of the model but uh with some smaller parts of those models you can get a pretty good feeling of what is happening behind the scenes or what are their implementations actually in pytorch so in order to uh have a look at the predictions of this empty model or not so empty but untrained model I'm going to iterate over the data water the training water that we have for the features and labels and enumerate on the training quar as I've already told you and in order to get predictions from the model I'm just going to call it as if it was a function and this is going to pass be passing in the features uh keep in mind that the features in our case is actually a b size of eight so let me just show you that something that is uh particularly important when you're working with pytorch yeah so we have eight examples and each example have two different features and if I actually run the predictions you'll see that the model is actually handling this so what is happening behind the scenes is torch is just fine with passing in uh features that are actually in batches so it is expected that the batch size is going to be eight so even uh uh this help here is telling us that this is actually the case so uh pytorch internally is allowing uh the B size to be run through here and let's have a look at the predictions and yeah you can see that the predictions are quite bad compared to the labels that we have in the batch yeah but we'll see that after some training these are going to be improved much more now that we have a model let's let's continue with how to train the model itself so I'm going to get the training in here and uh one of the first component that I want to have is a was function which is going to be used in order to essentially evaluate how well the model is currently doing and in our case I'm going to be using a hobber was and this was is uh quite good at um navigating between very large losses and very small very small errors and very large errors since you might have a look at the fact that the actual calories are very large numbers and one way to uh escape from that is to do some scaling of the labels and the inputs but in our case I'm just going to uh train the model as it is so in uh real world example scaling might be very advisable right here but this will be besides the point of working P torch and then the next component is going to be the Adam Optimizer in our case we're going to be using these optimizers to update the parameters based on the wor function and essentially make us model worn better over time and this one is going to be taking a warning rate of 0.001 which is essentially the default value of it and then I'm going to get the model and pass it to the GPU device since this is where I want the GPU or the training to be happening so these two are or these three are actually the components that we need in order to start the training process and for the training coup itself I'm going to be U running this training for 100 EPO so what this means is that our model is going to be have a look at the training data or the full training data for 100 times I'm going to store the best validation was and this uh is going to be minimized over time I'm going to also State uh save a state of the best model uh next I want to store the training courses on the training data and then the validation Wes in arrays and then for each epoch in the number of epo I want to do a couple of things first I'm going to calculate the training was using uh train one uh I have a typle here train one Epoch function which I'm going to show how we are going to write in a minute then I'm going to pass in the train water was function Optimizer and then device then I'm going to get the validation was called a validate function which again I'm going to write in a minute with the validation water was function and then a device and then if our validation was is smaller than the best validation was thus far we're going to update the best validation was you know how to pretty much write a minimum function all right and then I'm going to also get the best model state to be the copy of the current model all right and then I'm going to Simply append the train losses and the validation losses right here of course if you run this uh right now we don't have the next two functions that we need but overall this is the flow that I'm going to be using in order to train the model uh we are going to train the model on one Epoch of training data validate on that and if the validation was is better compared to what we had before I'm going to store it and then I'm going to also store the best model also I'm going to store these values in a sort of history for the training cles and validation wases and I'm going to show you uh why we're going to need that in a minute so uh next I have uh the validate function and then the train one Epoch functions uh let me just copy and paste those uh since it will be a lot of time in order to uh write them in here so this is the train one Epoch function so uh in here we are essentially using the training qu iterating over the features and the labels putting them on the device that is passed in in our case this is going to be Auda device then I'm zeroing out the gradients on the optimizer uh this will ensure that the calculations for the wor function are correct then I'm using the wets model features uh I'm predicting the outputs based on the features or the current features I'm then removing the wets uh sorry I'm returning um the first dimension of the wets or these are the predictions the Ws then for the labels I'm converting him into a floating Point number so this is somewhere where converting those to floting point number will allow our floting our R function to do calculations correct and after the calculation is um done on the was function between the wets or the predictions and the labels that we have we have a was function then this was error or function result is going to be propagated back to the model using the was. backward then we are advancing the optimizer and uh this essentially is going to update our model's width and then we are calculating the wor currently and we are multiplying that by the the size of the features that we have uh and this is accumulating within this uh variable and finally I'm returning the average was thanks to the accumulated value and division of that by the all of the elements within the data set so this is how we calculate the training course and then we have another function which we've called validate and I'm going to show you the contents of that it's a bit more simpler so here um we started with model train and then uh in the validate I'm putting the model into evaluation mode and I'm calling uh torch inference mode this will essentially turn off any gradient calculations Etc and probably will speed up the inference STS for your model uh we are going through the validation water uh then exactly pretty much the same thing then uh getting the predictions calculating the W and accumulating know that we are not using a zero gradient or any optimizers or backward um propagation here since we're just evaluating the model th far uh so I would say this is a strip out version of the TR one Epoch function and let's see if this is actually working so you can see that the training is actually happening and it's it's not uh extremely fast but it appears to be working uh just fine so the training took roughly a minute in order to complete uh we did 100 EPO with that and then for the next part I'm going to show you how you can get a feel of how well the model is doing so the first thing that I'm going to do is to plot the train and validation Wes so I'm going to be calling pt. on the train Wares and then P plot VES with a label of validation then I'm going to label the was on the Y AIS and on the x-axis I'm going to be labeling Epoch and then I'm I'm going to call a legend on that and this is the result of the training uh you can see that this chart is actually showing us for each book how much each value of the wases were and you can see that the validation was has pretty much stayed flat uh after let's say 40 EPO but the training was keep kept on lowering so this is uh something that you might uh have troubles with if you keep the model uh Waring the training was but the validation was is pretty much staying the same or even starting to go up this is known as overfitting even though the model is remembering the training data it has a lot of trouble with the predicting on new data so this is essentially what is happening here uh will be happening here if you for example Contin training this model probably not really I can't really be sure about that another important thing uh to note here is uh the fact that you might have a look at the fact that the validation wases let me show you the minimum of the validation losses is this number while the vast the wor validation was is let's see a bit higher so the minimum validation was is probably somewhere where your model is going to be perfect so this is why we essentially took the best model State and stored it into a variable so in order to W this model I'm going to be creating this variable called best model and going to copy the current model that we have and on the best model I want to the state dict of best model State and this will essentially uh get the parameters and match them with all of the keys and you are getting these types of message this means that you're doing the correct thing uh so this will pretty much verifies that we are taking the model copying it and then wading the state dict of the best model and the parameters along that so uh let's have a look at how do we how can we get some predictions from our model first I'm going to be putting the model into evaluation State and I'm going to take the predictions and then I'm going to save all the labels and uh recall that I'm using torch inference mode to somewhat speed up the inference and I'm going to iterate over all of the examples within the test water and I'm going to call the model with the features and I'm going to pass in the features to the device which is going to be our Cuda device in our case and for the predictions I'm going to append those into the list that we get and for the labels I'm going to again take the labels and create a list of those uh you can see that the inference is actually quite fast uh we have uh just 10 examples uh with the batches so again this is pretty fast let's have a look at what we have in our prediction so compared to what we had with the android model you can uh clearly see that the numbers are now uh anything like 0 or 0.1 or something like that those numbers are much more improved now and you can see that the model has at least warned something and we have to check whether or not this something is actually something that can be useful as a prediction so uh let me just get the all the labels that we have and convert them into a list so this will essentially merge them into a single tensor flotten it into a list and get the labels list I'm going to essentially do the same thing for the predictions since there are also in U penser here let me check the labels and the predictions length just to make sure that everything is all right yep 88 examples for each and then I'm going to show you U what of the results I'm going to paste it in so it be it's going to be a bit simpler to understand I'm going to essentially create a scatter P between the labels and the predictions and this is uh something that you get so on the Y AIS are the predictions and then the True Values or the labels are over the x-axis and you can see that the model is predicting most of the values very well as long as you're along or very close to this value you're doing pretty good job uh but we have some examples that are very far off their predictions are very far off compared to the real or true vales so this might be a room for improvement there might be a room for improvement in order to create better models or maybe choose better features in order to predict those types of examples so this is it for this video we've seen how you can get started with pytorch at least the latest version of pytorch thus far and we've seen what tensors are how you can do some operations on tensors and how you can get some real world data and use it in order to train your own models from scratch with pie torch thanks for watching guys please like share and subscribe also join the Discord channel that I'm going to link down into the description and I'll see you in the next one bye
Info
Channel: Venelin Valkov
Views: 1,287
Rating: undefined out of 5
Keywords: Machine Learning, Artificial Intelligence, Data Science, Deep Learning
Id: dgs_9quxZXk
Channel Id: undefined
Length: 68min 25sec (4105 seconds)
Published: Sun Mar 17 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.