Visual Feature Part 1: Computing Keypoints (Cyrill Stachniss)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome today I want to talk about visual features as they are playing a very important role in photogrammetry in computer vision and in robotics because those features allow us to describe the image or parts of the image in a very compact way under a very used in a large number of applications so we have for example investigated the simultaneous localization mapping problem using landmarks and features are one way for detecting point the environment that I may want to use this landmarks from my sensor data or they allow me to make correspondences between images for example so if you have two images taken from the same scene those visual features allow us to make point correspondences and therefore are very important for a large number of especially geometric estimation tasks but can also be used for image recognition and tasks for example aware future descriptors play an important role so this lecture here today is split into two parts one the first one looks on the key points parts this is basically the location of distinct points in the image and the second part of the lecture looks then into descriptors so how do we describe those key points that we can use them for matching purposes for example so consider you have an image like this image over here and you want to find distinct points in the environment and what you see here are a lot of red points which are drawn on top of this image and these are points which are kind of locally distinct and these are certain points that stick out let's say compared to their local neighborhood we are interested in those points which are locally distinct because we hope that these are points which stand out and even if you take an image for example from slightly different location those points will kind of still be locally distinct will stick out and then we can use them for making correspondences fee for example between two images taken from the same scene and the task today is how to actually fight those distinct points it's the first part and in the second part of the lecture we will look into the question how do we describe those points that we can actually distinguish them in a good way so again why it is possible maybe we want to do with 3d reconstruction and these are exactly those points which I want to use as that's a sparser representation of the original image and one estimate the 3d location of those points in the scene this is one example another example is that I have two images of the same scene taking from different viewpoints as you can see here even taking a different point in time because here is no Christmas tree and here's a Christmas tree being set up so you can see that they have been the images have been taken at a different point in time and then we want to make correspondences between those images so which point in image number one corresponds to which point in image number two we will not be able to do that for all points often we're not able to do that for all points by at least four subsets of distinct points because this is a very important information for us if you want to estimate the camera motion for example and so what you see here are two images of the same scene and you can see a certain number of feature points or a key points being illustrated over here and these islets key points where most of them are actually found in both images and just by searching for every of those key points a partner and the other image I can actually establish correspondences or at least an estimate on how image correspond sees will look like so there are two things we need to distinguish here the first one other thought called key points which are the locally distinct parts in the image and we're interested in actually localizing those points in the image so for the key point the the important element is where is a key point where is a locally distinct area and an image and we're going to pinpoint this with a pixel location or maybe even some pixel curacy and the second part this is not covered by this lecture by the thing but by the second part of this lecture is a feature descriptor so how can we describe this key point so that if we have a large set of key points we can actually distinguish those key points from each other and this is typically done by inspecting the local neighborhood of that key point so typically a small image patch around this key point and then summarizing the information that is stored in this image patch is condensed into for example a vector representation so that we have this crypto vector for every image so that at the end we get a key over here and to every key point we have the script director which looks like this in some situations I mean you may have descriptors which try to describe the whole image with one single descriptor we are here in this course looking into descriptors which just described a single key point so we have the key point let's say being at that pixel location and with this key point is associated a vector and that describes the local neighborhood often gradient information is used in order to achieve this but there are also other means for describing a key point so the first part of the lecture will focus on the key point and the second part will focus on the descriptor so we are starting with the key points and so the task can be summarized as finding distinct points in the environment and we will look into a set of approaches for doing this we start with harris corners one of the early techniques used to do this spec the corner detector and improvement of this sheet amasi and then the first operator which was actually first opera actually our first operator was actually the first one doing this trying to find locally distinct points in the environment and but the Havas entry to mozzie actually gained higher popularity over the years and today she Tomasi is one of the standard approaches and then we look into another part of approaches which is called difference of gaussians where we basically take images which are differently blurred and compute the differences between those blurred images and this is one way for finding key point that is used in the sit feature descriptor which is one of the popular descriptors which are there today again the second part of the lecture we then look into feature descriptors such as sift brief or orb but that's not part of this video so we start now with finding key points or locally distinct points in the image and the first part of that looks actually on corners or corner like structures the key inside here said corners I actually actually locally distinct points or locally distinct structures and this is typically a result of the gradients that they have in the image so image gradients means gradients in intensity values so for example dark pixels and two bright pixels and the what makes corners distinct here is that they have gradients in two different directions for example in the X direction and in the Y direction which allows them to be localized quite precisely so besides corners there are also edges and edges is typically a sudden change from a bright to a dark or from the dark to a bright pixel this is what an edge is but the problem with the address is that they actually only well localized orthogonal to the edge so the meso in the direction of the brightness change and not along whatever an edge created by a physical object corners Harbor can be seen as two edges which are intersecting with each other and which are approximately orthogonal and this allowed them to be localized really well and even if you change your elimination let's say it's your images of bright or a little bit darker you typically have the effect that the gradients are still there and still visible may be the intensity of the gradient changes but the overall gradient and directions of the gradients are still there and therefore they are very good for localizing those points even if you have two images of the same scene taken under different conditions and that's the reason why people have been focusing on corners explicitly for finding good key points and still one of the standard techniques which can be realized very quickly in this kind of one of the standard choices that you would do today so we said corners are basically two edges which are roughly a fogger note to each other and an edge is a sudden change in brightness that means we need to change in intensity values in our image and we can actually do this by for every point in the image for every and XY location and image we just look to the change in intensity values in a local neighborhood and we can express this as a function so my function f which is a the 2-dimensional input x and y well basically does it computes in a local patch this local patch is described by W or W X Y is a local patch around the X Y position so by iterating the points in the neighborhood and by computing the difference in intensity values between the location UV and UV plus some small offset let's say one pixel to the right just as an example I can actually compute the the difference between the intensity values I Square those differences and sum them up so if they were in areas where the function has a maximum I can actually see that these are locations where there is are a lot of gradients in the image so things that probably stick out so looking into the squared the differences in in intensity values is something that tells us about tell us something about gradient information and seems to be both suitable information in order to compute kind of locally distinct points in our environment so what we're doing we are computing the sum the squared differences of image intensity values and doing this under shift Delta e Delta B that's basically what the equation is saying here so what we then can do is we can say okay I can take this image function over here and I can actually do that by using gradient information by using a Taylor expansion computing the first derivatives of this function I can approximate the second term here by saying this is a density value in U and V plus the Jacobian in x and y multiplied by the U DV and again these jacobians are the chain the first derivative of the image function here in this case in the X and by decision so actually the whole thing is Jacobian these are the individual partial derivatives in the image with respect to the X direction and with respect to the Y direction so if I now put this equation into this equation over here we can see that the intensity of the actual pixel pixel value here will disappear so I don't need the actual intensity value of of the image what maintains is actually just these Jacobian information so that means through the Taylor expansion this expression simplifies just being the Jacobian times the change and this value is actually squared or we can put this in matrix form and then we have our shift vector transposed times a matrix which consists of the the partial derivatives of the image intensity values multiplied by the shift vector again and here as before we are summing this over a local area what we now can do as a second step we can take the sum and actually move that sum inside this matrix because here we only have Delta u Delta V Delta u Delta V and here we're just summing over using these so the location where the gradients are actually computed so can she move the sum into this inner matrix and then have a sum for all those elements over here so that this expression can be written in the following form it so kind of this element in here it's kind of an important element now because it contains all the information about local patch it contains the gradient information and the shift vectors are just here outside and so this element this matrix sitting here something which is called the structural matrix and it summarizes the first derivatives of the of the image in a local patch and accumulates basically the gradients in X in y direction and XY so we have this structure matrix over here which tells us something about where the decision where the gradient is actually pointing to and what the magnitude of this gradient is averaged over a local region and we can see that this matrix now will actually provide us with valuable information how gradients look like an image and we can use this in order to decide if a point is a locally distinct point or not so the structural matrix is key in finding address and corners and you can already see this and if you looked at this type of matrices if you compute the eigenvalues and eigenvectors of such a matrix and you can see that you have one large value and a very small one business more edge like structure because it means that the gradients are actually pointing all more or less in the same direction and if you have two eigenvalues which are large and similar size that means we have actually a good distribution of gradients in different directions which tells you something more or the first one is more like an edge and the second one is more like a corner so the structure matrix encodes the intensity changes in a local patch they efforts useful information for us in order to locally mm find distinct points and it's just built from image gradients so how do we obtain those image gradients how do we compute the first derivative of an image function and here again we look to the standard approach to do this something kind of basics of image processing where we have something called the Sobel operator or alternatively The Sharper writer which are basically small kernels that we use in order pull perform a convolution on our image so we basically have a derivative operator in X which is convolved with an image in order to and squared in order to get the the elements over here for the off diagonal elements we have the differentiation operator in X and in Y always convolved once within with the image and this the thing for for Y down here and again these are Zobel or sha operators that you can use in order to perform compute the first derivatives of your image function and again here an example how char operator in an x and y or a Sobel operator in x and y location look like so they are operators computing the the derivatives together was a small thing in the other dimension and order typically have that this operator has somewhat better properties in terms of maintaining the the orientations within high accuracy compared to the Sobel operator you can basically take Sobel or shark for our application here that doesn't really matter and so by just taking these operators and moving those operators over the image and performing the convolution we can actually compute all the the gradient elements in theory basic just then just need to sum over the elements that result from the convolution of the Shaw operator or the Sobel operator with our image itself so it's something that we can do in an efficient way our the standard procedure it's not complicated to compute this first derivative of an image function as it may sound like if you hear that for the first time so what we then can do you look into the structural matrix say the structure matrix and kind of the the numbers that will sit in this structural matrix actually summarizes the dominant directions of the gradients around the point that I used around which these structural matrix is computed apart so just as an example so if you have a local region where you have let's say bright pixels over here and darker pixels over here so you have gradients pointing in both directions this is clearly a situation where we have a corner and then we will see gradients in X as well as in by Direction popping up here so if the gradients only pointing upwards and downwards those two elements would be large and those two elements here would be rather small so and they should have approximately the same size because the gradients in this image have more of the same size so we would have the setup that we moreless have a diagonal matrix with similar values in the first in 1 1 & 2 2 so this means we would have 2 icon values which are similar so there's something which is actually a very good point for being localized in this example over here where we only have gradients in one direction then only one of those values would be large and the other one would be small and you can see here we can localize actually very well in this direction so if you shift the image on top of each other in this direction we can localize very well but in this correction we can't localize variable because there are no gradients pointing in this directions they could actually slide a template over this over this image would always get the same value so something which only allows me to localize a point bell in one direction but not in the other one so something which is not that great for finding locally distinct points and here even the areas where I don't have any change in intensity values but she's localizing this point here just giving this local patch it's actually something which is very bad so in this case we will have very small values and actually eigen values which are kind of close to zero or very small so something will say this doesn't have a lot of texture a lot of local structure so this is something which is not very good for matching procedures okay so in this example we can see the both values over here of the structural matrix which take large values and they are roughly zero in this example over here I have only gradients in this direction so we will have values different from zero only in this dimension and here in this dimension it wouldn't be the case and in this case all the values are actually pretty small so what we're interested in is only the first case so this is the case we're interested in that's actually good for us there's a locally distinct point and those two elements here are not locally distinct so we're actually looking for situations like this one in order to find locally distinct points in our images okay so as a result of this we need to look for matrices which not have large values here and small values here because if you would rotate your image by 90 by a 45 degrees for example you would have a slightly different setup but what we're interested in is actually finding a matrix which has two large eigenvalues which this means we have two dominant directions and we are two dimensional image that means all the directions of the image are have dominant gradients and that's something that we are interested in and that means we need to make decisions based on for example eigen values of this structural matrix and in order to say yes they is a good point this is kind of a lot of has a local lot of local texture which allows me to look the distinct those areas from other areas and again there are different approaches that have been proposed not to do this this holds for Harris Rita Massey as well as for the first operator which have been proposed in the late 80s first first not often first know what the first one and the others were following shortly after that and they all rely on the structure matrix and ye but they use different criteria in order to say yes this is a locally distinct point or no I'm going to ignore this point so the overall core idea underlying them is similar the actual decision which is made is sorry the overall idea is the same of using the structure matrix what is different is the decision on when the point is considered to be a good or a corner like object or not um but maybe it's have been said about a firstness approach this was is an approach which also offers you sub pixel estimation for that which the others don't do in their standard setup so it's look to the Harris corner criterion Harris corner is at least initially was the most popular approach and has been used in a lot of approaches so extracting Harris corners is is a standard procedure in a lot of computer vision tasks so there's actually criterion which tries to separate good and bad points and this is basically uses the determinant and the trace of this structure matrix which it can be computed based on the eigen values so basically looking into the product of the eigenvalues and the squared sum of the eigenvalues and then there's here some some weighting factor K which sits to begin something between 4% at 6% and then we can compute this criterion R and if R is approximately zero that means we have roughly a flat region so that's something which I'm not actually interested in if R takes values which are smaller than zero that means one of the eigenvalues substantially larger than another one so this is kind of this edge like structure also something we are not necessarily interested in we are interested in values for R which are substantially larger than zero and that means that we have two eigenvalues which are similar and they both are substantially different from zero and this is exactly the corner that we actually want to have so we can also visualize this this this plot over here so if this is kind of the 2d space then we have these areas down here where we have very very small regions so the two eigenvalues so this is the first eigenvalue this is the second eigenvalue take a small value so these are kind of the flat points i'm not interested in and everything which sits here and down here are regions where one eigen values of centuries larger than the other eigenvalue now may want to ignore them and this area here is the area where i want to be where those eigenvalues are large and substantially different from zero so that's how paris makes its decision based on this value our sheet amazi does it in a different way so it computes the smallest eigen value so it just looks to the compute the eigen values and just considers the smallest eigen value which i can actually compute in this way again using trace and the determinant of my structural matrix and then basically saying if the smallest eigen value is larger than a threshold then I say yes I'm happy I can distinguish those points if the smallest eigen value is smaller than a threshold T I'm actually ignoring it again we can put this into this two-dimensional drawing of seeing the difference between Harris corners and the shitamachi corner detector and then we would see a plot which actually looks like this so I have my threshold T sitting over here that's a flat region that's something which is more like an edge there's also more like an edge and I want to be here in that area so the one is basically rectangle and the other one had kind of this shape that you can see that the overall idea between sheet amazi and harris corners is actually quite similar and also the first operation actually does it similarly to the to the Harris corner detector it's the only difference is that it operates on the inverse of the structure matrix M which can be seen as scheme this be a covariance matrix of the possible shifts so how certain I am about about shifts that I can do and then the first operator explicitly optimizes for finding point that I can localize as well as possible and again again take some criterion into a ground which into account which takes into account the size or the roundness of the error ellipses that are computed from this covariance matrix so the covariance is of the shift so it but first know basically tries to do is actually minimizes the uncertainty in the shift so selecting points which minimize my uncertainty so point that I can localize as precise as possible and does this in a very mathematically sound way in addition to this this approach can also allow us for doing suffix estimation so if you're especially interested in points you can localize as precise as possible for let's say stereo reconstruction first operators are a way that we can do and then first and ask it basically looks or for my Maxima suppression so taking points which actually stick out that's actually something that not only first nerd is doing also horizonte' Tomasi either using this our criterion or the minimum eigen value and basically look in every region which point which kind of survived the criterion actually still sticks out so has a higher minimum eigen value or a larger R value so that you kind of at the point which which locally pops out we can illustrate that quite nicely will be with the first operator so again this were our images and we are now looking into this local neighborhood and we can see here if here the corner is in the image here the corners in the image and here the corners also in the I mentioned the question kind of which at the point which locally stands out which gives me the highest localized ability the highest r-value for example in this region and a lot of people intuitively would say actually let's hear the case because the corner sits in the middle of that region but that's actually not the case so this point over here is actually the better point so the point which just sits a little bit inside the corner why the case if you look to the gradients that are accumulated within those within those regions you can see that the number of gradients is small here it's only in this small local area down here here you have the gradients in this area basically in a quarter of the of this region and here you have the gradients along those two lines which is kind of the the largest or longest lines in the image where those gradients pop up so what you would go for you would actually think this point as your feature point because that's the point that you can actually localized in the best possible way a few remarks if you want to implement that in reality for real images if you have RGB images which most of you I guess will have what what typically does one converts it to grayscale image and does all the operations on the grayscale image and what's also needed given that real-world images are typically affected by noise you often want to do something on the input image before you actually apply the be the corner detector here so there's typically a smoothing of the image that happens before but that it kind of depends a bit on the noise level that you have in your image so to kind of summarizes how we can actually do this for more operational point of view and this is here the example for the first operator so you start with your original image and then you perform a smoothing of the image and this can be again done by taking your image and and convolve the image with it or you convolve directly the smoothing operator with the shower or Sobel operator so with computing the first derivative because you can also first convolve those two kernels with each other and then apply it to the image so your Berek doing a smoothing and computing the first derivatives of your image so that gives you the gradients in X and the gradients in Y for every location so this basically turns a single image into two images and where one image restores all the gradients in X direction and the other one stalls all the gradients in y direction and then you can square those gradients or multiply those with each other and square those so you basically get the three values for JX j xj x JY j YJ y and then you basically sum them up and you can do this summing up if you use for example a box filter and duster on a box filter without the normalization over your image or you just accumulate your values by by summing up those values which is mathematically the same but can be efficiently done with in the at the convolution and then you need to take those values and compute from those values the minimum eigen value for first nails as well as for XI Tomasi and then you have your criterion like your minimum threshold for example and then some non maximum suppression of approaches sitting behind this and so with this pipeline basically just running convey a few convolutions multiplications again convolutions were summing things up a few multiplications and summing and then you need to compute the square root here just maybe a bit of special operation and then bring something of threshold a or threshold in a local area or no maximum suppression so all operations that are actually quite easy to execute and just as a kind of small example and if you see here those brighter rectangles on a darker surface you can see how the approach always finds and localizes the corners very well and precisely nail downs the point that can be very well localized an image again if you would have a point sitting here on that line that means you could slide that point in that line that there's no local distinction between them at the corners however it doesn't matter in which direction you move you would actually see that the that the points are locally distinct and if you slide a local area a local patch around you would see that there would be actually a mismatch generated with the image which would not be the case if you slide along one edge if you take the pics on the edge which is not a corner things and so with the original example that I started with the the image and the Harris corners I can detect we can now run this of course on multiple images and then see how we that we can expect the corners in both images and then the next step would be to actually making correspondences between those images of course based on the different viewpoints different corner points will actually pop up there won't be identical but a lot of them are actually similar to have a corresponding partner but clearly in all of them and there's also noise in the process because the images have been taken from different viewpoints under different lighting conditions and specially the different viewpoints lead to a different projection of the 3d object into the 2d image space so they are naturally not identical but all the three corner detection techniques so Harris cheetah Massey as well as first an operator all perform very similar the idea behind them is actually quite similar so they all use the structure matrix which contains information about local gradients in a neighborhood of a point and then they where they differ is basically the way the the decision is made if a point is locally distinct so either by using this our criterion or by looking to the minimum eigen value in how to do this and Harris Khanna is was a very famous corner detector and has been used in the pod forever it passed very very often first none was actually the first one putting this in a burneth mathematical framework of minimizing the uncertainty in the shift and also long for sub-pixel optimal optimization and today she Tomasi seems to be slightly outperforming harris corner detector and it's kind of the standard choice that most libraries are using today so if you look into OpenCV the standard or default corner detector is she - mozzie but again all of them are not that substantially different because they all build upon the same idea using this structural matrix so the second part I want to look into you today is still in the area of key points is the second part which is kind of the idea of you the difference of gaussians so by looking into images and kind of differently blurred images and subtracting them so images that have been run under different convolutions with Gaussian filters in order to find locally distinct points you can also see that similar to Conners but also blocks can be identified with this approach and this is a technique which is used inside the feature descriptor which is one of the gold standards today or very popular descriptors at least of the kind of many of the designed ones which are not learned and so this is a technique which sits behind the key points of the sift descriptor and therefore it should be explicitly mentioned over here so we can see this as a variant of corner detection which provides responses at corners certain types of edges and also blobs which are distinct so regions where which are of constant intensity value but which are again distinct from their surrounding it's like a big whatever black dot on our area on a white surface and then you you could pinpoint the center of this black region also something which locally sticks out and that's something that the difference of Gaussian approach I'm actually fine so what's the key idea why it's called different of Gaussian or different difference of gaussians over the scale space pyramid scale of the space parameter sorry so what the cliquey idea behind it it it first performed the gaussian smoothing so we run a Gaussian kernel over the image and smooth the image and we do this with different kernels so from very small losing to extreme and there's a similar if you use whatever your standard image processing software like or Photoshop or whatever and you basically add Gaussian blur it's exactly what's going to happen it's basically a kernel which is here which is applied and this kernel performs a Gaussian smoothly and what we then do is we compare those differently for most images so we compute the difference of the smoothing so a strongly through the image with a slightly less strongly similar image and we compare that one again to a 2:1 or is it leave me a little bit less smooth so we can always compare images just computing the differences in the intensity values of this differently from used images and then we select those regions which should locally stand out so where the differences stand out locally in the direction of the smoothing and in addition to this we can do this over differently scaled image so for the original image for an image rights they have of the size 1/4 of the size so that we are actually able to find points on different scales in the image ok let's have an illustration how that looks like so what we see here are I my is my input image just applied with different Gaussian kernels so let's say this is 1 which is just a very small thing this is even more themost and this is again more smooth more most and it's very extremely amused and then so these are kind of all the the images justify their having a different blur and then I can always take the pairs pairs of them and subtract them from each other so just subtract the blurred intensity values and this gives me new images which show up the difference of gaussians and then I can see this basically as a three-dimensional structure so X Y and my my 3rd dimension and then I'm looking for points which locally stand out so where the point which is either differs in X Y Direction or it's the same coordinate but differs in the in the in the smoothing direction stands out so let's say this is a point over here which is an extremist or larger or smaller than all its neighbors on in the same difference image or from the different the difference from the less smooth image or from the louder smooth image and those points were to stick out our points I'm actually interested and then I'm not performing this only for the original image let's say I reduce the size of the image and perform the same operation of on the images of a different size so again perform different variants of smoothing compute the difference of gaussians and then again can find the the the points which which locally stand out they can do this can perform this operation to see where I get with strongest responses in terms of points that locally stand out so that I get some invariants with respect to scale so I would get if I have an picture and object and then move a few meters away and picture the object again and I would run this descriptor then the being closer to the object can be seen by working on a larger scale and being further away and a lower scale a smaller scale and if I take all that into account I'm I find points the same kind of points for the image taking further away and the image take further to the object so that the same interest point is actually found on the same plot is found okay here is an example this is an original input image that have blurred this input image and substract the two input input images with each other you can actually see that here the local structure so the local gradients actually stick out so I hope you can see that in the video but you can see here basically the boundaries of the objects of the of the leaf for example stands out here and then by taking those points and comparing them with respect to the neighbors and x and y but also with their neighbors in in respect of the blurring pyramid we actually take Maxima or minima of their points so in this sketch based representation so let's say this is the original image which is not smooth which is the image which is slightly smoothed a bit more in even more and more and more and so we can see so we will compute the difference between those two images the difference between those two images the difference between those two images those two images and so on and so forth in order to get the differently for most images and compare the differences between them you can see for the strongest more strong with most images you can see that kind of blobs stand out even even more and we basically look for extreme values in this representation so the brewer but the brewing does it takes basically out be the high-frequency noise or noise and kind of the high frequency and we look to kind of the larger changes in their form we'll also be able to find blobs with this approach and you can actually see that the subtracting of the differently blurred images can be seen at bandpass filter where certain frequencies are stay in certain frequencies or actually filtered out and so you're basically performing this band pass filter in between different frequencies and then select those where you get the strongest response compared to all the neighbors and the different pass filters so the key points are extracted from this difference of gaussians over the different focusing scales and so basically taking multiple variants of this stacking on top of each other and then looking in this kind of three-dimensional space not sticking out of my image plane and looking for those also over the whole power mat so let again we look into those points which we stick out so those crosses over here point which are extremist compared to their neighbors and what we also need to do is performing then sine lummox extrema suppression in order to find the the certain structures like edges I actually want to get rid of so i'm looking again for tests similar to what harris corners was doing trying to find those where i have strong responses based on IDing out on an eigenvalue test so they actually don't generate the responses add edges and but but i want to have iron values in both directions either to find corners or to find larger blobs which are again surrounded by structure then of course you would also get gradients pointing in multiple directions so similar extremist oppression as we used for the corner detectors can also be found in the difference of Gaussian approach so regarding those key points what I've done I have introduced kind of two groups of approaches for finding locally distinct points the first one were those related clearly to corners and operating the structure matrix let's call them a structure matrix based approaches and presented the three most commonly used techniques for this Harris corners Rita Massey and first an operator and as a second approach going over this difference of gaussians so taking differently blurred images and comparing those differently broad images and looking for strong responses in in those in those differently blurred images and then also making certain tests similar to the corner approaches based and eigenvalues that we all we also do here and this allows us to find corners but in addition to that also blobs which the first approach would not find and these are two ways for localizing features and again depending on the choice of your setup you may want to run with some corner detector or you want to go for the different calcium so a lot of cement systems use some kind of corner detectors and then add some other descriptors or some descriptors to it or you go over the difference of gaussian if you use thinner or other box features you would use the difference of gaussian approach internally and then use a bit more expensive to compute but quite expressive feature descriptor with also feature that even can use in your slam algorithm for finding correspondences so kind of these two approaches are central building blocks for most of the lengths I really call hand design or manually designed features that we have today so until 10 years ago maybe all the features were kind of hand design features or most of the features were really hand designed over the last years we see more and more features being completely learned from data using deep learning techniques for example and then you would you don't have these building blocks explicitly in there but some of the approaches that you find for learning those descriptors actually and that those corners is kind of background information into the networks or the network actually finds structures which are look very similar to corners and in corner convolutions that you would actually generate so but there are key ingredients from all the menu design features that we find today which actually brings me to the end here on this lecture on key points and just to summarize key points tell me something about the position of locally distinct points and descriptors tell me something how they look like how I can describe that we only talk about key points here and we'll talk about descriptors in a few minutes and the key points define the location the descriptor tells something about the local appearance most of the key points you use some form of gradient information in form of the structural matrix or even by computing differently smooth images also get kind of the edges of an image which is very similar or generate responses similar to what we see in terms of gradients and corners are very good for that and also blobs at least if the surroundings of the block is visible are a good key points which can be localized and for which we will compute feature descriptors in the part two of that course so in the part two of the course we'll take those points that one of those feature descriptors let's say the difference of Gaussian generated for us and then we'll say how can we actually locally describe them so thank you very much for your attention and I hope you're interested also in the second part where we look into the question how we can actually describe the local neighborhood of a key point in order to be able to distinguish key points if we find them in multiple images thank you very much for attention
Info
Channel: Cyrill Stachniss
Views: 8,631
Rating: undefined out of 5
Keywords: robotics, photogrammetry, computer vision, image processing, lecture, bonn, StachnissLab
Id: nGya59Je4Bs
Channel Id: undefined
Length: 46min 3sec (2763 seconds)
Published: Tue Apr 07 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.