PyTorch Lightning Tutorial - Lightweight PyTorch Wrapper For ML Researchers

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey guys welcome to your new pie charts tutorial today we'll be talking about pie torch lightning pie charge lightning is a lightweight python trapper it aims to reduce boilerplate codes so that implementing the algorithm and also improving your model so tweaking and optimizing it can be way faster you don't have to remember all the tiny details of the pytorch framework because lightning takes care of all that and another great feature is that it prints out warnings and gives you helpful machine learning tips or hints if you make mistakes so we will see all that later so usually i'm skeptical when it comes to wrapper frameworks that abstract away a lot of things but this time i can really recommend it i still advise that you learn the underlying basics first and if you haven't yet then you can check out my free pytorch course here on youtube the link is in the description but if you are already familiar with pytorch then you can check out pytorch lightning and see if you like it or not so today i'll be converting one of the codes from my pytorch course to pytorch lightning so you can get a feel how this framework works and how you can use it all right let's jump into it so first of all pi touch lightning is open source so you can find it here on github and they also have a website with a nice documentation that should get you started i will put the links in the description as well and now here are some of the details that you no longer have to worry about with pytorch lightning so you no longer have to worry about when to set your model to training or evaluation mode you don't have to worry about using a device for gpu support and then push your model and all your tensors to the device you can easily turn off gpu or even tpu support with lightning and you can easily scale it up then you also have to no longer care about the zero grads or calling backward and optimize a step you no longer have to worry about using torch no grab or detach and as a bonus you have integrated tensorboard support and it also prints out tips or hints which we will see later so let's jump to the code and for this example we'll be taking one of the codes from my pytorch tutorial so you can find it on github and then in this case we take tutorial number 13 so this is a simple feed forward neural net which is applied to the mnist data set to do digit classification so now let's grab some of the code and write a new code with pytorch lightning so first of all so let me delete this and now let's start with a fresh python script and by the way when you want to install pytorch lightning you have two common options the first one is to use pip so you just say pip install pytorch lightning or if you're using conda then you can grab this command so i already did this in my terminal so pi touch lightning is already installed here so let's go back to the code and copy and paste some of the things so we want to have all the same import statements and as an addition now we also import pytorch lightning so we say import pi torch lightning with an underscore spl and now we want to convert our model our neural net to a lightning model so we grab this code and then we copy and paste it in here and now instead of deriving from nn.module we now derive from pl dot lightning module this will give us the same functions as the original model but it will also give us some more functions which we will see in a second so now the init function is still the same and the forward function also is still the same so we then need all the hyper parameters so let's grab them as well so let's grab these hyper parameters and paste them in here and now we need to implement some more functions so let's quickly go to the official website here you see a step-by-step guide by the way and you also have a nice comparison how pie charge code looks versus pie torch lightning and what it abstracts away so as you can see we also need to define a training step a configure optimizers function and a train data loader so let's copy all of these in here and now for the let's start with the simplest one so the configure optimizers there you simply put in the optimizer that you created so in our original script we set up the optimizer here so this is the atom optimizer and then you return it so we can return this one in one line so this will be our optimizer our atom optimizer with the learning rate from our hyper parameters and then we want to in implement the training step so for the training step we are doing exactly what we are doing in our training loop and now we don't have no longer have to worry about our for loops about unpacking our images and labels and then here the optimizer zero grabs the backward and the step so we simply grab this part and paste it in here and as you see in the training step function we get a batch and a batch index so we unpack our batch into x and y and then let me copy the code so this is now our images and our labels and then we have to reshape our images and we no longer have to push it to the device so we remove all of those things and then we do the forward pass and here we can actually use self because we are inside of our model class and then we apply the criterion and in our original code we set up the criterion here as nn.cross entropy loss but we can also use it instead if we use the functional module so for this we say import torch dot nn dot func channel s f and then down here we say our loss equals and now we can call f dot cross entropy with our outputs and the labels and these are all the things and for now we only want to return a dictionary with this loss so pytorch lightning needs this dictionary so that it can show you the loss during the training and this is all we need in our training step and then we also have to implement the train data loader function so what we want to do here is we want to do what we did in the beginning here we set up the training data set and the training data loader so let's copy this and paste this in the function here so first our train data set so this is this one and then the train data loader so this one and then we want to return the train data loader or train loader and now this is all we need to start a training and now what we want to do is we say if name equals equals underscore main then we have to set up a lightning trainer so for this we import a trainer so we say from pi torch lightning import trainer then we want to set up a trainer so here we say our trainer equals our lightning trainer and for now i can show you a trick that is very helpful during development so you say fast defron equals true this will run a single batch through training and also through validation if you have a validation step and with this you can test if your model works so now we also have to set up our models so we say model equals and let's call this lit model to make it clear that this is a lightning model so lit neural net and here we say let neural net with the hyper parameter so it needs the input size it needs the hidden size and it needs the number of classes which we all have here for our hyper parameters then we have to fit it to our trainer so we say trainer dot fit our model and now we can already run it and see if this is working so now let's go to the console and now let's run this file so we say python lightning dot pi here we get a error so of course i renamed the lit neural net so i also have to say lit neural net here and now again let's run it and now we see it starting so we see that we have no gpu support we also get an overview of our different layers and the parameters then since i'm running it the first time it's downloading the data set and then here is the training so it worked so it only did one batch because we set fast def run equals true so let's clear it and test it one more time so now it shouldn't download it anymore and this was way faster and now here for example we get one warning so it says the data loader train data loader does not have many workers which may be a bottleneck consider increasing the value of num workers argument try 4 so this is actually the first tip that can help us to speed up our code so here what we want to do is in our train loader we can also give it the argument num workers equals four and now let's clear this and run it again and now our warning is gone so now here we see the overview we now only used one epoch and here we have the loss so now if you want to have a full training we can also give our trainer the argument max epoch equals and then the number of epochs that we defined num epochs so this is one of the hyper parameters and now let's set this to false again so now this should do a full training and now let's see if this is working and now this should take max epochs sorry now this should take a little longer and yeah we see so we get a nice progress bar and we see our training works and our loss should slowly decrease so this is working so now let me stop this for now and now the next thing we want to do is to do our evaluation or testing so similar to the training step and training data loader we now also want to add a validation step and a validation data loader so let me copy this and let me paste it in here and this has to be called validation step and here we are doing the same thing actually so as in the test or train training step so we reshape our images then we do the forward pass and calculate the loss and this time we used the key well loss so validation loss and now let's set our fast def run to true again and let me run it again so now if we run it we should get an error and yeah here we get the exception you have defined validation step but have not passed in a validation data loader so now we know what we have to do we also have to use a validation data loader so let me copy and paste this and this has to be called val data loader so if you're not sure you can go to the official documentation and then scroll to the validation loop there you see the three functions that you need so now first of all now we have our validation data set so here we want to say or we want to grab the next or the test data set so in this case i call the test data set but actually you want to have a split of your data to a training data set a validation data set and a test data stat so the validation data set is used to get an unbiased evaluation of the model performance while tuning the hyper parameters and then the test data set is used at the very end to provide an unbiased evaluation with data that the model has never seen before so in this example we let's say we only have two splits of our data set now we only use the first two steps training and validation so let me copy this and paste it in here and let's rename this to a validation data set and then let's grab the validation data loader and rename it to val loader and then return the vel loader and now let's run it again so let's say python lightning dot pi and we get an error so data set equals val data set of course and let's clear this and run it again and then it should do one pass through training and validation and we see it worked and now we again get the same warning so here we use the num workers equals four so let's clear it and try it again so you see it already starts giving us hints when we make some mistakes and now we see it worked and now it it's also to suggesting to define a validation epoch and method for accumulating stats so let's go to the documentation and grab this function to see how it looks like so we copy and paste this so this is the function that is executed after each validation epoch and here we want to calculate the average loss so actually in our example we want to do exactly the same so we want to call calculate the average loss by saying this is torch stack and then since we called the key validation loss we can access it here and then calculate the mean over all the losses and now for now let's not worry about this and then we return the average loss as a dictionary and then we are done so now if we clear it then we should get a one pass so now no more warning and now we have everything that we need for our code so this is all we need with pi touch lightning and now let me show you one more hint what we can get so for example in our validation loader if we accidentally set shuffle equals true and then try to run it we should get a warning or a hint so yeah here we see user warning your valve data loader has shuffle equals true it is best practice to turn this off for validation and test data load us so you see we can actually already improve our code with this framework and now let's compare this with our original code so if we have a look at this code in github then we no longer need this device function functionality and we no longer need to have the manual for loop over all epochs and over all the batches we no longer have to care about these two device calls we no longer have to care about the optimizer and the backward pass and also we can leave out the whole validation loop because we can calculate this with this validation step and now for example if you also want a training step we can easily edit the same way as we are doing with this function and with this function and now for example if you want to use gpus then we can easily do this by giving our trainer the argument gpus equals for example one and when you want to scale it up and have more gpus available you can also use 2 or let's say 4 or you can use even tpu support so it's really easy to switch from cpu to gpu and then scale it up and you can also have a distributed backend so ddp for example you can also easily switch to 16 bit precision you can lock the gpu memory so a lot of features that you can now very easily test with this framework you just have to um use all the different arguments for the trainer and for example one more helpful function that you can try is auto lr find equals true this will run a algorithm to find the best learning rate so you can try this one you can also set for example deterministic equals true so this allows it to reproduce your results or you can apply gradient clipping very easily by just passing in the argument gradient grip val equals and then some value between 0 and 1 i guess so for this you again have to check out the official documentation there you find all the different features and use cases and now let me show you the full training and validation so let's remove this and now let's set the argument fast def run to false again so again i'm not having a high number of epochs so it's only two because it takes a long time here without the gpu but let's run it anyway so again we see and the progress in our first epoch here and how our loss is decreasing and when this is done we should a faster progress bar for the validation loop so now be careful now here we see the validation loop and now our second training epoch starts so yeah this is working and now the second validation and now it's done and we can see and inspect the final loss so this worked and now as the last thing let me show you the integrated tensorboard support so as you can see there is al already a folder which is called lightning lock so it's odd already automatically saving some checkpoints so and now if you want to inspect the different losses during in the tensorboard then what we have to do is for example if you want to inspect the training then in our training step we create another dictionary and let's call this tensorboard locks equals and this is an again a dictionary and here let's call this train loss as a key and as value it's the same loss and then we also append the tensorboard loss to our dictionary and we have to give it the key lock and then lock and this is our tensorboard locks and now the same in our validation epoch end so here we create the tensorboard locks and here let's call it average validation loss and this is the average average loss and then here again we use the key lock and then the tensorboard locks and now let's run it one more time to save all those things so now we have to wait until this is done so now our training and validation is done so now it should have saved all of those to the lock directory and now we can start the tensorboard by saying tensorboard minus minus lock der equals and then lightning locks and by the way i also have a full tutorial about how you can work with the tensorboard i will also put the link in the description and with lightning you don't have to install it manually become because it comes with the requirements of lightning so this should automatically work now so now if we hit run and sorry this was no underscore minus minus loctor equals lightning locks and now it works so now our tensorboard is running here so now if we open up this then we should inspect the epochs then we have the average validation loss so since we only use two epochs and we don't see a lot of difference here so it's only a one line but for the training loss this is what we locked after each training step so here we see a nice graph how our training loss decreased so yeah you see we have automatic tensorboard integration and it's very easy to work with this tool so yeah these are most of the features i wanted to show you and let me know in the comments if you like this framework and if you also think that it can simplify your development process and if you like this video then please hit the like button and consider subscribing to the channel this helps me a lot and see you next time bye

Info

Channel: Patrick Loeber

Views: 50,211

Rating: undefined out of 5

Keywords: Python, Machine Learning, ML, PyTorch, Deep Learning, DL, Python DL Tutorial, PyTorch Tutorial, Tensorboard, PyTorch Course, PyTorch Lightning

Id: Hgg8Xy6IRig

Channel Id: undefined

Length: 28min 2sec (1682 seconds)

Published: Sat Aug 15 2020