Image Segmentation using Mean Shift (Cyrill Stachniss, 2021)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today i want to briefly look into image segmentation using the so-called mean shift algorithm so we want to dive a little bit into the details on kind of how can we basically segment similar regions in an image as belonging together and that's something which can actually be quite useful when we analyze images or look for similar regions in an image so consider you have an aerial image of here some landscape these are kind of agricultural fields in this example and the question is now how can we actually identify similar regions just based on this image without providing any additional background information or training data on how such an image should be segmented just by saying we want to group similar things together one example how that could look like is here shown by these kind of outlines basically show areas which are similar to each other and we are breaking down the image content into similar regions and this again can be used you know to say okay this is a field of type a there's a field of type b and we have a second type of field b sitting next to it so i could have also segmented it together but didn't happen over here there's a field type c this may be a road so we can add different types of informations to these individual regions just as an example so quite often we're actually faced with the problem that we have data points in this case these are pixels in an image which you want to actually group together and that's something which is typically called a clustering problem and often this is kind of an it's called an unsupervised process so i'm just saying okay here is my raw data see what you think is similar typically i need to provide a measure of similarity to this algorithm and let the algorithm group similar items together typically doing this again without training data so without telling the system okay here 10 000 images which i grouped in the following way the items in there please do it similarly that would be an approach which uses so-called training data or as a supervised algorithm we want to do this here unsupervised without this type of training data so clustering is a problem in general is not just holds for images of grouping similar data points or items or pixels or regions in an image and represent them with a single entity or with a single token this can be an element that stems from those data points or could also be an average of those data points and this is a tool which is useful not only in image analysis or photogrammetry or computer vision but also for a large number of other tasks we will focus here on the lecture on images and exploit certain properties for images or algorithms which work specifically well on images we don't looking into general clustering where algorithms such as whatever the k-means algorithm is probably the most popular one we are just losing using here or looking here into image context and this becomes the clustering there becomes useful for example if you want to look for similar regions you want to analyze your image or even for visualizing high dimensional data for all these types of tasks clustering can actually be a useful tool so the goal is we have an image and we want to break down the image into regions so this could be in this example over here this is a region which i may later on define as a lake um i have a region over here which could be green could be kind of rocky beach over here could be clouds over here blue sky over here the important thing to notice i'm not providing those labels at the moment i'm not saying that this is actually sky and this is a cloud and this is a lake let's say the system try to find similar looking regions um and i want to do this without this additional training data i may just saying okay and pixel intensity values which are similar up to whatever 10 intensity values are probably things which are similar so this is an information that i typically need to provide to this algorithm if i don't provide training data so i need to give the algorithm an idea what means close to each other and what means far away from each other so a measure of similarity is to be something that i have to provide at some point in time to these types of algorithms if i don't provide examples of what i think is close and what i think is different from each other okay so key challenges in this general clustering problem is kind of what makes two data points look similar or areas in an image which uh which are similar and how do we compute actually an overall grouping from a pairwise grouping so just being able to say two data points are similar to each other um doesn't or is not necessarily enough information to say how the whole image consisting of a large number of data points should actually be grouped together again this is a general clustering approach and they're different techniques how to do that i want to look today into the mean shift algorithm and especially using them for segmenting images so it's kind of a specialized form to the task that in this lecture we are partially concerned with and the mean shift algorithm has been developed around 20 years ago and it's a very versatile technique for performing this clustering in images and leading to the segmentation of those images so there are two examples of landscape segmentations that have been generated with the mean shift algorithm and you can actually see that you get a segmentation which is actually quite similar to what a human would probably do and actually combine semantic information together so saying the lake is here kind of grouped together but it's important to know that this semantic information is not explicitly provided here to that system it's just based on the definition of similarity that different data points are grouped together over here and what we want to do is in actually providing now with an algorithm that allows us to find those regions and the mean shift algorithm does this in a certain way in the in the way that it actually performs a non-parametric density estimation so a density estimation says okay we have certain features in that space and certain features occur very often or solved which are similar to each other and i can see the whole thing as a as a density function so i have my input values which could be pixel values which could be intensity values which could be rgb values which could be a combination of rgb values and xy locations in an image or i can add texture information to this so whatever it actually is i can have this two three four five whatever dimensional space which i define my density function on and then if i have data points which are very close together as illustrated over here i actually say okay this density function should take higher values in here because the concentration of these data points is higher in this region there's a very low density of data points over here right so this corresponds to this area if you visualize this as kind of a hilly surface as mountains you would see that the mountain is very peaked in areas where there are a lot of data points and the mountain is very flat and small in areas where there are nearly no or just a few data points so means shift algorithm performs a form of density estimation and then what it does it says okay from which starting point in this space over here if i just kind of always walk uphill to which mountain peak will i actually go if i just walk uphill never downhill again only uphill so if i'm starting somewhere over here i will actually end up at that mountain peak over here this red dot over here if i start somewhere over here i will end up this mountain peak if i start over here i would also end up at that mountain peak if i start here i will end up at this mountain peak and if i'm starting here in this area i will all end up at this high mountain peak so these black lines are basically path walking uphill of data points reaching this hill and basically all data points in this surface which would walk through the same mountain peak are grouped together are clustered together form one cluster that's kind of the core idea of the mean shift algorithm and it does so by basically starting an area and then always walking uphill so saying okay given a local estimate of this density of this density over here where i need to walk into increase the density because increasing density means walking uphill towards the peak of that mountain and then basically seeing at which data point i'm ending up at which mountain peak and this provides me with my um with my with my clusters or my regions um that should be grouped together that's kind of the core idea behind it so it's a combination of estimating a pdf or a density function um and finding the maxima so the extrema's the modes of that distribution and all the data points i'm starting there always moving in areas which get more likely or they have a higher density better at the peaks where i'm ending up these are basically my um my clusters so basically the the point that belong or lead to if i kind of walk uphill where do i end up which mountain peak or which mode is actually the element which defines my cluster so i'm starting with a local density estimation this approach and then always try to walk into regions where the density increases and we can actually illustrate this in a quite nice way so consider this is my spread of data points at least a local view on my spread of data points and i'm just starting randomly somewhere at a random point either a point that comes from the data from the data or some other point in that space and i want to say okay where is that mountain peak to which mountain peak does my start point actually belong to what we're going to do is now we go through this example and see to which mountain peak does this approach actually lead us to okay so let's start we have a data point which you start with over here and this data point has a region of interest it basically tells me something like the local area a local region of interest that i should take into account in order to estimate the local density right so and this is something that i as a user need to provide okay basically what's the size of this region of interest and what is how strong should data points impact this data point over here so for example if i make this region extremely small this if this is my data point and make it extremely small there would be no point falling in this region there's no um kind of local density i can actually estimate because or only zero density that i can estimate and this doesn't help me if i want to walk uphill if i'm in a completely flat land i don't know where to go okay so i define a neighborhood which should include a couple of neighboring points that i can use to locally approximate this density function if i of course make this circle too huge let's say the whole space then it's basically the whole space i'm basically flattening everything out and i will just only find in the end a single mean because i'm taking into account all the data points into estimating a local density so my local density would immediately be a global density and just a uni or a unimodal distribution in the end which is something i totally don't want to have because i want to find the points which are lead to the to want to group those together which leads to the same mountain peak in my density um in my density function but of course if i want to have multiple clusters it means there should be actually multiple mountain peaks okay so what we're doing we are now collecting all the points in this region of interest in order to compute a local density so how do we actually do this how do we compute the a local pdf in a non-prometric way and i won't do it in a non-parametric way because i can't assume this for example to be a gaussian or following a special distribution i just have those points over here and what we're typically doing is we are performing we are using a function a so-called kernel function and used to average over the neighboring points saying what is the impact on that of that point on the density at that location what is the impact of this point on the density here at that location and what you often have is the closer you're to that point the higher the impact the further away the smaller the impact so this point should have a smaller a higher impact sorry this point should have a higher impact on the density at that point compared to this point over here or this point at the boundary of that neighborhood and what we're typically doing we are saying okay let's fit a small distribution typically kind of symmetric shape around these individual data points we can kind of sum them up you can see this as fitting a small gaussian distribution around all those red points over here and then basically estimate of all the points that fall into this region what's the sum of this gaussians would actually look like and this i can use to say this is a local distribution approximation of a local distribution and this is um what is called kernel density estimation or parts and window estimation where i'm basically what i'm doing i'm defining a local neighborhood so this was a circle that i've shown before and i'm iterating over all data points in this circle here called m the data points x m and i'm saying how far is this data point actually away from my query point so the query point x would be this one and this would be my different x m values over here and then i typically do this with a kernel which has takes a distance into account between between this and typically some normalizing factor and you can actually see that this is something that looks very much as the um the parameter that you put put in the exponent of a gaussian distribution so if you take a gaussian kernel um so with this size and put it in here then you can actually see that this gives you basically a gaussian smoothing of those individual points okay if we do so let's see how that looks like what happens so we take our point our region of interest and then saying okay if we now compute the weighted sum of all those points taking into account the the density distribution this will be a new center of mass so basically this is my uphill walk along this um along those points so what i'm doing i'm actually computing a vector which shifts my mean and the weighted mean of these points over here um to this new location because for all the points considered in this neighborhood the the weighted mean of those points is actually lies over here okay so i'm basically shifting this point over here it's called the mean shift vector and then i'm repeating this process i'm again computing a new mean computing the shift vector and moving this the circle over here and this is basically my uphill work i'm looking in the local neighborhood what's the density of points around me and i'm moving towards those points where the density increases so this is basically the uphill walk that i'm doing in this algorithm okay i can repeat this process another uphill walk move uh move further until i'm basically at the point where i'm basically overlapping with or roughly overlapping up to epsilon around the um the point so basically i found the mountain peak here at that point so how do i compute this this this mean shift so this vector that i'm shifting so it's basically discrepancy between the data point that i had before and the sum over all the data points in the neighborhood together with the weight so how far is this point away under my kernel function from the current estimate that i'm having performing the appropriate normalization so that things are properly normalized to one and then i'm actually moving in this direction so it's basically this direction of the weighted mean so this basically tells me where should i go what's the update step that i'm actually doing and for that we need to define what is kind of this window over here and this window is basically defined through this m over here of which points i'm iterating and it also kind of has some dependency with the with the value over here so even if i take more points into account but if a depending on the size of h they may basically drop to a zero impact further outside here so again this is x is my starting point where i started and then that's where i'm actually moving to and so this gives me my mean shift vector i'm basically always walking along the mean shift so it's kind of the local mountain peak given my local view that's how you can actually visualize this so the circle here is kind of the local view from your starting point you're saying okay where is the where my mountain peak under this kernel function in my local viewpoint and that's actually where you're moving to in this way you're gradually moving upwards so if you for example have two kind of clusters over here in the sense that they are this is just the spread of points in this in the x illustrating the xy space and all the points which fall into this kind of blue shape will actually end up here at that mountain peak and all the values over here will end up here that mountain peak and then these two regions the red one and the blue one will form the two regions the two clusters in um in in this space in this image and so these areas also called kind of the basin of attraction so where you're attracted to the same mode and this basically forms your clusters so all data points lying in your basin of attraction which will be dragged to the same mode actually defines the region where you're going to look into and the same over here in this initial example that i had so if these are is my my points in my two-dimensional color space in this example then we can actually color them based on in which mode they actually end up and so you can see kind of this kind of the the the top-down view of this view or the same view as this view over here but basically colored where the color indicates in which mode they actually end up with so all the data points which lie here in this region and again this is not an xy space as a color space so all points which lie in here will actually end up at that mountain peak all points are here will end up in uh okay i need to do that well okay this area seems to be this area over here and there are three more one two three so these three should be this area over here and this straight line should be these this straight line over here so this one is this one this one corresponds to this one that one corresponds to this one and the green the yellow one is actually this highest peak over here so i can basically start for every point and see where i'm ending up and this provides me my clustering my mean shift okay so what the mean shift does it actually tries to seek for the modes of the density function of points so basically increasing in density and if you want to form this as an algorithm the key steps are we are um first thing we have to do is we have to choose a kernel so which kernel function to use such the gaussian kernel and the bandwidth of my h parameter and also need to define my neighborhood although this is kind of related with each other and what i then do for every data point that i have so for every pixel in my image basically i'm centering this local window around this point then i need to compute the mean of the data points in my search window so i'm basically looking to the neighborhood of that point and computing this mean shift back door and then send moving along this mean shift always moving on until i'm ending up at a at the new mean location or at my mode and then i'm repeating these steps for um until i converge and then for every data point i basically have reached my point my my my mountain peak and store okay this point ended up in mountain peak number one this point ended up in mountain peak number two and this then provides me with the relevant cluster information so then i basically assign all points which are at the same mountain peak or very close to the same mountain peak as forming the same cluster and that's a way how i can segment my image effectively into different regions so something you need to take into account what is your input to that so if you define a kernel and again the standard choice is the gaussian kernel over here on what is this actually computed and this can be as i already illustrated before a multi-dimensional space so it can be color information but typically you also want to take the position in the image into account quite often you don't want to have something green down here and something green up here being actually grouped together to the same peak i mean sometimes you want to have that then you would ignore the pose information or position information in the image but maybe you want to take that into account so your input space is not necessarily one dimensional or two-dimensional shown over here can also have higher dimension also taking different times of features into account so you need to be able to compute a feature for every pixel this can be the color information the position this could also be gradient information basically taking a local neighborhood into account what are the australians or what information do i have about texture and then for this individual dimensions i may need to define um different kernels so because i most likely will have a different kernel if i'm thinking about intensity values compared to xy space or some measure for a texture or something like this so i typically have different different kernel sizes over here so different radii where i'm actually looking for neighbors and again then standard process initialize the window at every location perform the mean shift and then basically performing the merging operation based on um on data points that are where the the final position that i actually found that are within this uh the kernels the thighs of my kernel so basically grouping in similar points where i ended up at similar locations and if i do this i actually get pretty good results for the um for the image segmentation and it's actually approach where you don't have too many parameters to tweak i mean you need to define your kernel function and define this um your input space but then it's basically just the kernel size for those individual features that you need to tweak in order to get different results out so this is an example of the input images shown over here and the output images shown over here and you can actually see by the uniform colors that the approach actually does a pretty good job in segmenting this image of course there are some mistakes if you think about the kind of semantics if you see this semantic segmentation so to say where every pixel would be a semantic class so this is basically a shadow on the pathway of course takes a different color over here because it appears to be different and the system doesn't know that this shadow doesn't change the environment it's actually the same footpath over here of course it's something that we can't expect such a system to do if you don't provide it with examples we just said what looks different what does not look different and this kind of dark or blackish pixel dark gray pixel um of course looks very different to this light gray pixel um and thus will be grouped into a different segment but that's kind of very much expected for an approach that is not trained with training data specifically um similar examples from those landscapes images but you can see here there's actually a very reasonable segmentation and the results that are shown here are pretty good for considering the quality of the segmentation and given that you don't provide labeled training examples a completely unsupervised approach you only need to specify your distance function or what you consider as close to each other basically through that kernel size so a few pros and cons what's good what's bad about the mean shift algorithm so it's what's very good is it's a very general approach you can basically apply this to any type of image data or even data outside the image domain and works fairly well compared to other algorithms you do not specify how many clusters you actually have with other algorithms such as the k-means algorithm you need to specify okay i'm expecting to have five different regions or five different segments in my image and you don't have that here by just estimating the means of this um of the density function the means or the modes of the density function of the means the modes of the density function basically defines the number of segments that you have out there and that's actually pretty good and also you are fairly robust with respect to outliers because you have this smoothing kernel into it they have this smoothing into account so if you have individual outliers they can actually be smoothed out away quite well the disadvantage is that you have to choose those kernel sizes in advance and this may be a bit tricky especially if it's something which is not related to the kind of location in image it's something related to some features which are harder to interpret and you want to take those features into account you need to be able to specify this kind of what means clause what means neighborhood and the the different size of the neighborhood can have a very big impact on your final result depending if um different segments are actually grouped together or um you have an over segmentation that you have actually too many regions which are actually identical are broken up into multiple different regions and again this doesn't work that well for higher dimensional features but for this standard image task that actually works very well so this was a brief excursion into clustering and only looking into clustering for image segmentation but i want to point out that this is a clustering itself is a more general approach that can be used for completely different tasks than just grouping similar regions and images together so it's basically grouping data points and we have no idea where those data points are where they come from into similar items and the only thing we need to know for that is a distance function to define typically pairwise similarity and clustering is an approach which is typically referred to as unsupervised compared to supervised approaches where we provide label information we don't need training data or label training data here for those clustering approaches but again we need to specify a distance function and the mean shift algorithm is one popular technique in order to do that and leads to pretty good results in the context of image-based segmentation but i want to point out that there is a large number of other other techniques gaussian mixture model estimation agglomerative clustering or k-means with probably k-means being the most popular algorithm for performing clustering so if you want to dive deeper into clustering there are much more things you may want to dive deeper into here for this task of dealing with images the mean shift is a good standard choice that you may want to start with when you're considering two clustering image data and combining regions in an image into kind of regions of the same type with this i say thank you very much for your attention actually the original work published now around 20 years ago on mean shift actually provides a pretty good explanation of what it what it is and how it works but you can also look into the zlinski book for further details on clustering in the context of images with this thank you very much for your attention

Info

Channel: Cyrill Stachniss

Views: 3,818

Rating: undefined out of 5

Keywords: robotics, photogrammetry, computer vision, image processing, lecture, bonn, StachnissLab

Id: 0h3bLXme46I

Channel Id: undefined

Length: 28min 18sec (1698 seconds)

Published: Wed May 05 2021