UNet for Image Segmentation - What You Need To Know! - Computer Vision

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] [Music] [Music] [Music] today we're gonna talk about the famous unit which is a fully convolutional neural network used for image segmentation we will talk about what unit is how it works and how to use it and whether it's still relevant today we've got quite something to cover today so let's get into it hi for those of you who don't know me i'm johannes frey but you can simply call me joe and i've been working as a software engineer for more than 15 years and before we start i would like to thank everyone who subscribed to my channel so far you are the best and to everyone else don't you also want to be the best so what are you waiting for go and hit that subscribe button let's start by talking about what unit actually is unit is an architecture for a fully convolutional neural network that specializes in image segmentation also called semantic segmentation image segmentation or semantic segmentation is a procedure where you not only predict whether something specific is on a picture like a dog or a cat but it is also able to create a mask that shows where on the image that specific object is located and its dimensions you can think of semantic segmentation as a variation of classification where basically every pixel in an image gets assigned a class that it belongs to which then kind of forms the mask unit is fully convolutional since it contains only convolutional layers and does not contain any dense layers or also called fully connected layers it was initially developed in germany by researchers at the university of freiburg for medical image segmentation unit was initially developed as a single class segmentation model but it can also be used for multi-class segmentation most implementations you'll find online support multi-class segmentations out of the box that sounds all very nice but how does it work you ask so well i've got my trusty ipad here and yeah let's do some explaining let's have a look at a picture of the unit architecture and i hope you now understand why it's called unit the unit architecture basically consists of two parts the encoder and the decoder part let's start with the encoder part the encoder tries to understand the what of the image it does it by using convolutions and max pooling as it is quite usual in convolutional neural networks you can think of the architecture in terms of levels so those are the levels that i'm talking about every level consists of two three by three convolutional layers each followed by a radioactivation unit the transition between those levels is handled by a 2x2 max pooling unit with a stride of 2 for down sampling you basically reduce the size of the input to the next level this is basically the encoding condensing the information with each layer and widening the receptive field with every level that we go to the bottom of the u-shape the size of the input halves but the number of channels doubles that way the network is able to learn more complex relationships in the image data by the time we reach the bottom of the u shape the model knows the what of the image fairly well so far this is basically what almost every convolutional neural network is doing for classification you would at this point maybe add some dense layers and predict the output class but for the use case of semantic segmentation this is not good enough the what is not enough we also need to know where on the initial image the objects are located and what their area is so the next thing that we need to talk about is the decoder part but before we do that it would be really awesome if you could go completely insane and nuts on the subscribe button and all the notification icons that you can find that would really help me out and thank you very much the decoder is basically responsible for the wear part of the model it uses basically the same architecture for each level but with a slight variation or localization to be more precise so-called skip connections are used where the feature maps of the encoder are concatenated to the output of the transposed convolutions of the same level the idea behind this is that since we already learned the feature mappings during the encoding process why not use them as well to provide more information while decoding also since it is our goal to get an output that is similar in size to the input we need to upsample the condensed information at the bottom of our unit shape back to the original size so instead of mux pulling in between the levels we this time use two by two transposed convolutions also known as deconvolutions to upsample the condensed representation of the image that we now have at the bottom of our unit shape back to the size of the original input image as you might have seen in the original paper the output is much smaller than the initial input though in the paper the outers used unpadded convolutions and because of this the output is smaller than the input but it might be a good idea to use padding to keep the original image size most implementations that you will find online will have parameters for whether you want to have padding or not for training the unit the cross entropy loss function is used and as a activation function of the final convolutional layer the paper uses soft marks what also needs to be mentioned is that in the decoder part the procedure of the encoder is basically reversed so where in the encoder the image gets smaller and smaller but the number of channels gets bigger and bigger in the decoder part it's the opposite so we upsample the data which means that the image or like the data gets bigger and bigger but with each level the number of channels halves but now you might ask well this is all nice but how to actually use it so for me it is important to also show how to actually use the machine learning models that i show you so what would be the training data the inputs and outputs for this case since we are familiar with the concept of inputs and target variables in machine learning we need to provide those to our model so that it can learn where the objects are located in the inputs that we present to the model inputs is easy that is the image that we want to have segmented this segmented a word it is a good idea to pick images for training that resemble the ones that we use later in the real application if possible of course determining the target variable can sometimes be more difficult in the case of single class segmentation we usually need to provide an image of a black and white segmentation mask to mark the area of the objects in the original image as the target in case of multi-class segmentation we usually need to use different colors to mask the different objects but since machine learning models need numbers to do their fancy calculations you of course also need to load and convert the image to a 3d volume or for better understanding you can also say multi-dimensional array but have no fear tensorflow provides all the models that you need for that unite was first introduced in 2015 and when it was used in competitions it basically outperformed the other models by quite some margin but since then quite some time has passed and the question arises is unit still relevant today there are many by far more complex state-of-the-art architectures around that might have better accuracy in segmentation but unit is still quite popular the nice thing about unit is that it can get quite good results even with a limited set of training data since it was developed with medical images in mind and uh yeah labeled training data is very rare in those use cases so that's it for this video i hope it was informative in any way if so going completely nuts and insane on the subscribe button and all the notification icons that you can find would be really appreciated thanks for hanging out with me and see you in the next video you

Info

Channel: Johannes Frey

Views: 35,820

Rating: undefined out of 5

Keywords: unet architecture, unet, image processing, unet semantic segmentation, semantic segmentation, image segmentation, unet image segmentation, computer vision, deep learning, computer vision tutorial, unet image segmentation tensorflow, semantic segmentation unet, unet deep learning, unet segmentation, machine learning, artificial intelligence, unet image segmentation tutorial, unet segmentation keras, unet model, unet segmentation tutorial

Id: -dfSZ_uLfo8

Channel Id: undefined

Length: 9min 28sec (568 seconds)

Published: Sun Dec 05 2021