Implement and Train U-NET From Scratch for Image Segmentation - PyTorch

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

we are going to implement and train units from scratch the results will look like this we will do it on a Kegel challenge data set called caravana image masking something like that I will put the links to data set and to GitHub link below to the description we can just check it out first we will go over unit paper we will talk about the project structure then we will start with the coding so unit is originally for biomedical image segmentation but it is a very popular architecture that has been used for diffusion models too main thing you need to know about this unit is this u-shaped figure this is the whole architecture a bit overview about architecture we will go over it one by one during the coding operations of this architecture will consist of down sampling what is called this the left part sizing it down and up sampling sizing it up and a little bottleneck here so it is a very simple architecture actually and very effective one and like I said we will go over it more detail and implement it one by one as a result for the biomedical images is like this given here we have like a bit implementation details like we are explained up sampling we have the future channels resolution channels and they're supposed to give us how to implement it it consists of repeated application two to three to three to three convolutions followed by yellow we will call it double convolution you will see it during the coding we have some training details here and there some data augmentation but like I said the thing you need to know is this just keep in mind this okay this is the main part project structure available like this we will have data sets file which we will create a python data set to load our data we will have a unit dot Pi file should contain the main unit architecture for that unit architecture we will needed Parts they will be in unit parts dot Pi file the training part will happen at the main.pi for training we will execute it and then everything is done if we want to load the model and get a result like the start of the video we will call infrastatpi the data has four of folders originally it has train and train mask folders train folder will have the jpeg images and the train mask will have the masking I selected some example images for manual test and manual test mess they won't be included in the training we will just use them to visualize the test results so let's don't lose time and get into the coding our units will consist of three parts first of all is the basic building block of double convolution which is the two blue arrows they are stack of 33 convolutions and the activation function second is the sound sample which is the merge of double convolution and two to two much cooling layer the third one is the up sample in which two to two up convolution output will be fed into the double convolution as you can see green arrow to the two blue arrows we will Implement them one by one and explain the process let's start by importing our libraries we will need torch and an end module from torch again we will implement the double convolution class the double convolution class will consist of the two blue arrows that we can see on the right figure we will initialize it with input channels and output channels since this is a part torch class we will use Simple init we will gather the layers in a sequential manner so one blue arrow will consist of a 2d convolution and added activation function since we are going to combine two we will need two of these so we will add another 2D convolution and we will add another value activation function this will be the sequence for our double convolution now we will Define a forward pass to this what we are going to do is we will take an input we will return the sequential operation by giving it to our variable okay this will take the X do all these convolution and value operations second thing we are going to implement is the down sample Parts the down sample part is as you can see in the right we'll be in two blue arrows which is double convolution and a one red arrow which is Max pool we already implemented double convolution we will use that too since this is a part torch class we will give NN module as argument to the class we will initialize it with input channels and output channels arguments against super in it says this is a python class we will Define our double convolution we will apply a match pulling layer to this 2D with kernel size 2 and strike 2. so this will be called down sample this will be the part that you see on the right of the figure we will have a forward pass as usual we will give our input to the convolution first which is the two blue arrows the output of that will be given to the pulling layer and we will you return them both because we will use them both in the architecture now let's move to the up sample we can see the upsample part on the right of the figure it says denoted by Green Arrow and a double convolution which has two blue arrows so let's again we will initially take input channels and output channels we will have the Green Arrow defined first which is up convolution two to two we can Define it by convolution transpose 2D with input channels input channels divided by two kernel size 2 and strike two we will have double convolution we are now moving to the two blue lines these are the all layers we are going to need for app sample now we will do the forward pass we will take two arguments this time two inputs X1 and X2 first we will take the X1 we will pass it from the convolution transpose 2D which is self.appage is the Green Arrow after that we will take the output of that which is X1 we will concatenate with H2 with dimension of one we will just return it these are the main building blocks that we are going to name for the unit one thing that I want to add is in double convolution part we added a padding of one in the original paper actually there is no padding but we want our input image to be the same size as our output image we will just add that padding of one now we are going to implement the unit itself first we will need the necessary torch libraries after that we will import our custom building blocks from unit paths unit will consist of three parts the left part will be stack of down samples at the bottom we will have a Battlenet at the right part we will have an up sample section let's start first let's define the class of units again we do with Anon module we will initialize it and as input we will have input channels and we will have a number of classes we do our supernet we start with the down sample now if you look at the left section as you remember two blue arrows and one red arrow is one down sample we will need four of those so this is the first one this is the second one 64 to 228 128 to 250 the search the last one as we can see is from 256 from 512. now it is time for the battle Knight as you can see at the bottom it uses two convolutions can just use our double convolution function here we move on to the up sampling we will have stack of four app samples as a reminder one up sample is two blue arrows and one green arrow that we implemented in the upsample function now this is our first first upsample function from 1024 to 512 and the second one 512 to 256 third one 256 228 41 128 to 64 and up convolution is done what is left is we will just return the output with 2D convolution input channels of 64. since it is the our less upsample output channels will be our number of classes and it will give its Channel size of 1. now we will Define the forward of our architecture X will be our input the town convolution will return two values first one is result of formulation and the second one is result of the pulling layer we will need them both so we store them both when we do our second convolution we do our parts down convolution and first one now since we did all our down samplings we moved to bottleneck we feed the last pulling layer output to the battle Knight we move it up sampling with up sampling we take the bottleneck first and take the result of the less down sample the result of the convolution part which is done for we take up one as input and we move to the down tree do it this four times two by this we are done with the up sampling part at best again we have to output and we defined our out layer two so basically we feed that that's our last upsample output and we return it this is the unit architecture now let's check if this works let's create a main function here Define our double convolution first let's check it if our double convolution works as expect expected now we will have an input image a dummy input image with one batch size and three channels we will have our unit module we will fit our dummy image to our unit architecture and we will see its size so as you can see when we Define our unit we set the number of classes as 10 so as outputs what we expect is one as a batch size 10 as number of classes and the same image size 512 by 512 another detail is that with our image we have three channels we fed three channels as an argument to our unit let's print this and check if it works as expected this is our torch sign the size of our output so it is working as expected now let's go to the unit pi and delete this debugging line let's leave it like this to fit the data sets to our model we will need a Data Center and we will use it pytorch again to do this the data set we are using is carvana masking challenge something like that our data is organized as this under our data folder we have train and train mask train contains the images the main images in the GPA format and train maps contains their masks from them I chose some images to test our model in the end and I put them in the manual test their corresponding masks put into manual test mask this is optional last to the manual test folders but because we want to test our model in the end we will put it so to implement data set we import the necessary libraries we import data sets from pytorch we will use transforms from torch vision and we will Define our data set as karma data sets as arguments we will have a root pathway which will point to your data directory and whether it is a test data set or Not by default this is not a test data set we assigned root pads if it is a test data set we want to load the data from the manual directories manual test and manual test mask so what we do is for images we sort the old image and lists under the directory of manual test and manual test mask like this we list all the files inside that we add the root path to this so as a result this serves us endless or sorted image names with full paths we do this through math and the reason it is sorted is we want the images to correspond to their mask and this is by their name so if the name is like start with zero mask is going to start with 0 2 and we want them both as to be the same location in the list that's why we use sorted if it is not test we do the same with the other to directories for our training train and train masks we will need to transform our images what do we mean by this is we will just open the images but it is not enough we have to resize them and we have to convert in them into tenses so that our model can understand first we resize it to 512 to 512. we transform it to tensors I forgot to mention that our data set has to have three functions at least with this in it first we will have get item which will get Index this is about Pi torch by the way we will return the item at the index and the last one is the length which will return the length of the data set now to get item we will open the image at the specified index we can do it by image open that we imported from peel we have the images list already we just select the necessary index since it is an image we call it to the RGB with three channels they do the same for the master to be converted to L which is for one channel after that we apply transform that we defined here so I will resize them we converted them to tenses and we return it we return image and mask the length as we mentioned as the final function that we need we just return the length of one of the lists length of our data sets this completes the implementation of data sets we have the model we have the data set now it's time to put them together and train them let's go let's start by importing necessary modules you'll need all these We All Imports units our unit module and our data set from our files we will Define our hyper parameters first I put learning rate 3E minus 4 but we can put whatever you want for batch size I put 32 and I will train it for two epochs now the data pad will direct to your data since I have trained this on collab it points to my collab directory now model save pad is where you want to save the model indent we Define our device if you are using GPU it will be Cuda if you are on CPU it will be CPU we create our data set we pass datapet as an argument to our data set this creates a trained data set object now we will split it to train and validation for that we will will use random sublets from pytorch it is here we pass our main data set that we want to split which is train data set we pass the fractions in which we want it to be separated 0.8 will go to the train data set 0.2 will go to the validation data set and we have to pass a generator we defined the generator here after that we will create a data loader this will help us during the training Loop we import the data loader here we will give our data sets which is train data set for train data loader we'll give our batch size as hyper parameter and we will Shuffle it we do the same for validation data loader now we Define the model input channels will be 3 our number of classes is one after that we put them into whatever device we are on if we are on GPU this will put the model into GPU we Define our Optimizer I used Adam Lee you define it by like model and you pass the parameters and you pass learning rate we need to have a loss function with all that out of the way it is time for training Loop we will iterate our epochs which is hyper parameter again for training we put the model into train mode we do it like this here so we are in training right now and we put the model in the train mode first our loss is zero since we are just at the beginning we will iterate the trend data order image mask will contain image as the first index and masked as the second index you may remember from the data set we return image and mask we turn them into floats we put them into whatever device we are working on we do the same with mask just it will be at the first index our prediction will be model image as an argument like this we get the prediction by press until we go here we have to calculate our loss so we have the loss function and our prediction and math this will return as a locally add a loss item which is the loss value to the overall training loss which we defined as 0 at the beginning we do the loss backwards we have to step the optimizer this make sure that learning is happening now we completed an Epoch for training we have to complete an Epoch for validation too to do that first we put the model into evaluation mode validation loss is initially zero we don't want to calculate gradient and waste resource power so we put that here when we are doing evaluation again as the training reiterates validation data loader same procedure we take the image make it flow put it to whatever device we are on we get the prediction we get the loss we update our loss after the all this Loop is complete it will divide our validation loss to the number of indexes plus one I want to get your attention to this this these things ensures that learning is happening we didn't put them here because we are just doing evaluation no need to that for every Park we return the train loss and validation off to track what's going on by the way this is all custom you can do however you like after all this finished we have a model we want to save that model to do that we basically say third save and as an argument we give the state dictionary of the model and this hat we have one hour to be saved to this completes our training Loop to start the training you can just go try terminal and give python main.pi I won't give it because I don't have a GPU I will do it on collab when you are training is done let's meet you again to use our model it's time to do some predictions I won't get into detail of the code here but we will have two functions one is for predicting multiple images at once and one is to predict a single image they worked in the same manner I will briefly go over them we Define the model here we load it from the model path whatever your model is located we do map location it use the torch device this overcomes the incompatibility between CPU and GPU but we are going to transform our image like in the data set we transform our image since we don't have a batch size we want to ask kids and add a virtual batch size we get our prediction with feeding image to the model we discard that virtual batch size we gave by using squeeze and we move it to CPU and we detach it then we want to switch the channels of the image because we are going to plot it we do it with permits after that we have the prediction mask we again remove the ritual batch size and we permute it to see the mask clearly we do this operations after that we are just going to plot it we initialize the figure since we are just going to have two images side by side we defined ranges one two three we add pilot for one row and two columns if it is first iteration we are going to put image we will map it to gray this is optional you don't have to but it looks better when you map it to gray if it is the second iteration we will just put the prediction mass and shove it we do the similar procedure for multiple images which is here okay now we defined our single image path which is the manual test an image which didn't take part in the training this is our data pads this is the model pad that I saved my model to and I just feed them to the to the factions as arguments okay now let's try this it may take a bit of time okay so this is for multiple images at the top we have the original images at the middle we have the golden mask so this is the absolute truth of the Mask original mask set the bottom are the masks that we predicted this is for multiple images and let's see one image and this is for one image with this we completed the inference part two that's all for today I will put the code in the description and see you

Info

Channel: Uygar Kurt

Views: 5,826

Rating: undefined out of 5

Keywords:

Id: HS3Q_90hnDg

Channel Id: undefined

Length: 21min 37sec (1297 seconds)

Published: Sun Jul 23 2023