Lecture 04 - Interest Point Detection

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so we will talk about this interest point detector as I said in the previous lecture we talked about the edge detection so we are looking at the edges and here we are interested in interest points and good example of interest find our corners you know because there are lots of edges here but there is called only one corner and the there are three main components that we want to detect these interest points identify them as shown in the right picture these are the circles these can be potentially interest points then we want to describe these interest points with some feature vector and these are the this is a feature vector or descriptor for this particular interest point it is you know high dimensional feature vector like a safe to be hundred twenty-eight dimensional feature vector this is the descriptor of that inner interest point and then ultimately we want to be able to use these descriptor and respond to make two images so that's the main point that in order to analyze images our video we want to relate one image with other image and to let that we have to find something interesting here and hopefully you can find same thing here and then we can match that this point mates with this despite waves with that okay so that's what is shown here in this image that these are the potential matches which is a correct one which human can easily match but computer has to do that automatically so now once we can do that it has lots of applications we can do tracking because in the video we have several frames and we want to track object from frame to frame so you want to say given a point object and this frame where it go in the frame to frame three frame four so we have these correspondences and we are track of that and there can be several objects we want to come with the tricks of all the objects like car so people bicycles and so on and of course we can do stereo matching because we have stereo to images and we can do calibration we can do segmentation and we can recognize objects this can be features of the objects and we can do 3d reconstruction even robot navigation the reward wins moving is sensing the environment and needs to analyze motion and know where it is so these are the basic building blocks the features interest points which are used in lots of these applications and also indexing and three well up images for lots of images you want to retrieve a given image can find me a similar image then we have to do some matching and matching is using severe interest points so our our goal is that we want these inter spines which are repeatable which means we detect them to respond one image and the other image of same scene then we should be a blue detects see similar interest by another and then we can match okay and here's example now the left image we have these interest points which we are detecting which are shown in the circles blue one on the right image Center spine we detected which are some red one and there is a difficult we cannot really make those because there are different places in the image okay so we will have a hard time so so generally the descriptor role is that we want to determine this interest point matches with the other interest binding image we want to say what is the texture what is the descriptor around that interest buying locally and if it matches this locally with let locally then we say it's the same thing okay and that's the example here suppose on one interest point our local window is here now if these are the potential matches then we want to say well does this look like that does this look like that and so on and how do we do that one way will be we can look at the RGB value where these pixel we have three values red green blue and we can match those but those values are not really good because they are sensitive noise they have some you know they vary with illumination and all those things so we want to come up with some better descriptor which is not sensitive those changes and that's what the descriptor we are going to talk about so it has to be invariant to geometric transformation like you know depending on what viewpoint we look at even though the same scene and also that the photometric which is the appearance the due to lighting and due to the illumination condition so that is the desired properties of a descriptor and we are going to talk about that what that descriptor should be so given these two images if you look at a small neighborhood around these suppose around here it looks like that and around here it looks like that and around here it looks like that and similarly this is the another image and we have these descriptor now we want to detect those which are which we can make easily so as you see that this kind of descriptor this kind of appearance it's very ambiguous because it can match anywhere this is a problem here this is also a problem because a flag you know we will find something like that of many places but this one is a little better because this is a more unique kind of profile here which can we can make easily so so detection is one important thing and in general the question we want to ask you know what we mean by interest point so one thing we want to say we want to make something there's a texture some there is some some we can express the what is in that area which can help us distinguish if there's nothing is a flat then it's very difficult distinguished okay so other way to is that while we look at the direction of an H and if the direction of a boundary changes drastically as shown here then that may be interesting point and that's normally a corner that's corner you have an edge and then all of sudden the eighth direction changes and that is a corner or it may be intersection of two lines and that may be also interest points okay so if you have a synthetic image where you have fixed gray levels then these are the corner points you know it's very easy to identify wherever there's change you know this is obvious corners and so on here's intersection of two lines and so on and this is a real image where there is there are some genuine corners on the windows and so on walls but there are lots of corners around in the trees we are there a lot of texture so properties of good interest mine detector are that it can detect all true interest points okay it's very similar to edge detection talked about they should not be false interest points and should be very localized it should occur it should give us exactly the same pixel location where it is occurring should not be delocalized and it should be robust to noise and it should be efficient we can do it quickly so the one way will be that we will look at the brightness or appearance you will hear these words brightness appearance grade levels color the same kind of thing same information okay and as I have said in this course you will hear again and again the derivative of images because that is the one which is providing a lot of information that change the first area to the gradient the laplacian and all those things we will we will see again and again and that's is interesting other thing you will see again again a Gaussian filter and that's the kind of team of this course so so other ways evolve get a edges get the fitted lines and you look at you know if they are intersecting lines are there is a change in direction so in this same first you'll detect edges you look at the curvature how the direction is changing how the curvature is changing curvatures rate of change of direction and so you know these are the general ideas here so we talk about specific interest buying detector which is pretty popular and works very well lots of people use it and what's proposed in 1988 by a researcher in England called Harris and so what he did that you look at the corner point and look at that you know we can actually recognize this corner point by looking at a small window and if you shift that small window then it should it give us a large change around that point okay so that's kind of intuitive description of that and here's an example so here's one situation where we are looking at in this area and we keep shifting the window and as you see there's no change because all this constant so this is actually not interest point you know it's a flat region another example here here we are going to move this window around the edge and again it doesn't change much it looks very similar so this is not also in potential interest point and if you look at here now here the change is significant because different part of the image around the corner is exposed and that is a good interest point which we want to use okay so so now the reason the corner our interest point is good because it does not have aperture problem as I told you that you can look at the scene by a soda straw where you look at a very small region then there's always ambiguity most of time but if you look at the whole thing then there's no ambiguity so the idea I said there's there is one image which is red and others are yellow if you see the whole thing then you of course say that while this pine should correspond this one and this pine should cause on this one and this one Corvallis is easy because you are saying the whole tape but if you look at a very small area then there's aperture problem because there's ambiguity so here is the ambiguity because if you take this small region which shown a circle in the yellow one and try to match that with the red one it can equally match with this equal image with this equally we are just exactly same whether you are looking a small neighbor local neighborhood so that is called aperture problem that is called that you are looking at local neighborhood and it's creating problems it's ambiguous it's not giving this unique match here the corner does not have a aperture problem because it is very unique you know it this kind of this kind of profile will not match with this one here or this one here are this one here this is unique place where they are intersecting so also if it is a flat region again this thing can match with anywhere in the other object so that is also a problem so that's why we prefer the corner so the way we are going to drive this that no so the question is also via convinced that given an image we can detect these interest point or a corner point and the question is how do we detect that now one can come up with some heuristic say well okay scan they may do this through that and which may work which may not for it but how do we mathematically derive what we want to detect and that's what we're going to talk about that and we are going to use this notion of correlation as we are talked about before that we have say F and H you know F is image and and H is a kernel and this should be H and we can correlate that we can find a correlation applying this mask as we have or before the F here this edge and then we get the correlation of this so what we are doing here we are finding the pixel by pixel multiplication of these two F and H and if they are similar this value will be high if they are not that similar value will be low and that's what the correlation idea is and we are doing this for this shift which is for K in a voltage V I and J so now the there are two correlations one is called the cross correlation which is between two different functions are two different you know images one is the F image other is the HMA they are two different and so let's call cross correlation then other is called when you take the same one we take the F and take another window shifted window and find the correlation picks up by passing multiplication it is called autocorrelation because we are doing with itself the first one was with F with H okay so these are two notions of autocorrelation cross correlation okay so another thing yes which case is the first one for each basic thing going on so this is a cross correlation and this is called autocorrelation so the so the cross correlation means that these two have to be different this and this cannot be same okay now this one they are the same so so as I give you an example here that as you see we are multiplying F with H these are different values but now somehow we want to play F with F so it is called the autocorrelation we are finding a correlation with itself because F at F but when you say F and H we are finding cross F is different durable edge that's the idea okay so now another notion is called SSD sum of squared differences okay SSD sum of squared differences it's very similar to the correlation so here instead of multiplying we are subtracting so we take the value of F and we go to H so corresponding values subtract then we square then we take next value these are the loop here and subtract the square and so on again exactly the same those nine values if the three by three mass we subtract corresponding pixels square them sum them up it's called SSD sum of squared differences okay so the interesting thing is that some of our differences is very related to correlation okay so if we want to find how similar F is with edge then sum of the squared differences to be small as small as possible ideally it should be zero is that exactly same okay now which means we want to minimize this SSD now in the correlation we are multiplying so if you want to say if they are similar then we have to maximize that okay so this is the if you take the SSD the definition we take the F we subtract from H square it now let's expand that so this will become F square minus 2 F H + H square that's your you know expert expression of a minus b whole square square minus 2 a B plus B square okay now when we want to find the how similar F and H are then this term does not contribute anything this is only telling us about F this term also does not contribute anything about how similar they are the only term contributes is this one which is a multiplication okay so therefore we can actually remove these two trams F is Squire and H square so now we are left with this term and for SSD we want to minimize so I'm put in here and so this is a minus now let's say we remove the minus sign then instead of minimizing this we will maximize this so write as a negative so therefore now as you see this actually is a correlation okay so that is what we have the correlation like that so which means in the correlation we multiply and we maximize in SSD we subtract and then minimize so they are related so that's what you need to know okay given this background so we will go back to the to the harris interest mine detector and look at that we want to look at the SSD essentially we want to look at the change in intensity for displacement UV and that is the image we want to detect at this point and this is the intensity at X Y and this is shifted by UV and we want to subtract and square the same exactly SSST okay so so this in a way is doing similar to autocorrelation and that's what we were talking earlier autocorrelation okay so now we want to do this for every possible respect and UV that's why this is a function of UV now in addition to this we will apply this weight as we talked about on Tuesday we'll apply a weight to each pixel and these weights can be uniform equally weighted or we can apply the Gaussian weight okay to give more importance of pixel which are close to the XY as compared to the other pixel which are further away so that's fine so now we look at this again the correlation surface or the SSD so we can look at the different points so this is a point around a texture area and this if you blow that up small window it looks like that now if you do the surface for each possible UV we compute the value and we plot it and that plot looks like that which means it has a nice unique minima and if we take the another pixel here now here it's mostly flat and this is what's shown here and that UV surface will look like this which does not provide you much information is very ambiguous and if you take one around the house here then it will provide you this is a profile here or like that this is again aperture problem because there's no unique minima there so therefore we will kind of prefer this kind of thing which will be a profile of interest point or corner okay and we are going to do this by using Taylor's do using the correlation so there's another notion which I want to explain very quickly is called Taylor series so if you have any function f which the function of one dimension say we can represent that function in terms of the value of function at the point and the derivative of that function at that point first derivative second or third that's called Taylor series Taylor was the famous mathematician say well let approximate a function by this it's a very simple idea and so very very useful idea so here we are saying f of X can be a can be approximated by F of a the first derivative of F evaluated a and X minus a second derivative F with respect to the X and evaluated two and X minus two square and so on thus called Taylor series so this is a better way to approximate a function it would simplify a lot of things again this thing you will you will encounter again and again in this course so so the so the idea is there are very key few key concepts which we will build on our discussion and once you are there clear then I think you can do pretty well and for the sake of completeness I'm going to cover each and everything so that I don't assume that you know these things even though you must have learned these in calculus and so on okay so now getting back to the the Harris so we have this expression and what we are going to do is we are going to find the Taylor series of this dysfunction which is IX plus u y plus V and we will keep this as it is keep this as it is and this function we are going to approximate it X Y so that's why first term it is y then it will be a first derivative of this eye with respect to X so this is a function of two dimension so differentiate with respect to X first and differentiate X with Y and then this term which we have which will be X plus u minus X because we are doing at the X Y so that we got you here and from here Y plus V minus y so we will get V and then this is exactly same so this pretty simple approximation of that I X plus u Y plus V by tailors three just first order term which means only you have first derivative with respect to x and y so now the interesting thing will happen because you can as you see this and this is same we can cancel it okay so that's what we are going to do so W is the main same here then this will cancel the u I x plus v iy and whole thing is Squire okay so that's good then we are going to we are going to rearrange this we are going to essentially break this so this is can be broken in two vectors one is I X I Y and then another vector u V so if you multiply this with that you will get this god this is the 1 by 2 and this is 2 by 1 you will get 1 by 1 so that's no kind of you have these two vectors then we are going to do further is that we are going to now expand this since this is Squire thing so W is the main here so we'll have UV X then we will have the transpose of this because we are multiplying so we will have I X I Y transpose of this is I X I Y and then transpose of this will be UV multiplication of vector or multiplication of vector with matrix so that's what we have now after that then we have weight again here and then we have the we have I X I Y which is shown here and then we have UV here and we are we are going to take this UV a little bit outside here will just group these two these two together with W and that is what we have here so since it's a scalar so we can move this in that's no problem and this is our vector which is here no no UV UV is a just a vector so see that this is a vector this is a vector this vector of this thing and this is the scalar yes so a scalar we can bring in here and we will define this thing this as some matrix it's not a constant just you know we are going to group them and define call that as a matrix ok so this for convenience ok so that's what we have so we have W I X I Y I X I Y this thing this three terms will call the matrix M so because this is the 2 by 1 1 by 2 this become 2 by 2 matrix and each element will multiplied by W and then we have our UV here which is UV here now this round is a vector and this is a transpose of vector here so that's fine so we have a UV and this whole thing is matrix M and UV so that's where we are not now ok so now we and of course the as you saw here that M matrix this matrix essentially a 2 by 2 matrix as we said which will be the first term will be IX x IX ax square and then IX x iy the second term in that column then iy x IX second column and then RI x Y Square and that is your M metric which is shown here okay and so that here you can see again that we are ending up again derivatives are images so David little X there is a supply and I X square is quiet I X I Y and all this thing so that's pretty interesting that everything reduces the derivatives ok so so this is the actually if you look at this this is a equation of an ellipse ok an aim is a covariance matrix now we are going to take this matrix M and look at its eigen values again this is a concept which is very easy very intuitive but I'm going to explain what we mean by that ok so let me let me actually do that we can come back so eigen vectors and eigen values it's a very simple concept very useful so eigen vector X after any matrix a is a special vector when you multiply that vector X with a you get the vector back in some scalar multiply so it will be like this I take a matrix a multiplied with X I get the vector back X and some scalar lambda so X is called eigen vector lambda is called eigen value that's very simple idea ok so in to find the eigen values and eigen vectors of matrix what you do you take a matrix a and do minus lambda I is identity matrix and find the determinant of that equal to 0 so that will solve this it will give you I gain values of a matrix okay and so so example is here suppose a is a matrix like that then these are the eigenvalues of this matrix so 3 by 3 matrix will have three eigenvalues so 7 3 and minus 1 these are eigenvalues and these are the corresponding eigenvectors so corresponding eigenvectors 7 is 1 4 4 and then corresponding 3 is 1 2 0 and so on so every eigenvalue will have eigenvector corresponding that because as you say X plus X is equal lambda X so that lambda is eigen value X is eigen vector so which means that you know it tells you the how you do it you find the eutectoid matrix a lambda with the identity matrix subtract these and find the determinant of that so if you subtract this become minus 1 and this particular by lambda you can minus lambda and all these terms become again 3 by 3 and determinant of this will become this term and multiplied by this and subtract and you know this how to do that and this one to move this column to row multiply these to 0 and then third one will be to move this row and this column and then take this one be 0 and so on so then you will have these and the root of these are lambda is equal minus 1 lambda is equal 3 and lambda is equal 7 very simple linear system and give you 3 eigen values now what you do you take the eigen value and find that corresponding eigen vector because you apply the definition of the eigen vector that says that ax should be is equal lambda X and that's what I have here because I have took the X common so I got identity here okay so this is my a this is identity this is what I want to find out so I add these up I get this one and this is the linear system 3x3 linear system three question three unknowns I can find out X 1 X 2 X 3 and I'll find out X 1 can be 1 X 2 0 X 2 is 3 0 so this is essentially eigenvector corresponding to the eigenvalue minus 1 and if you take that one multiply with the corresponding matrix you will get the eigenvalue back and you can verify that so that's the notion of the eigenvector eigenvalue determinant and all these things which you should know it's a pretty simple idea but very useful so let's now get back to D what we are doing before and the I think let me see if I can continue here yeah ok I need to go back to that slides so you'll have to edit this thing ok so so as we said we were here EU V and we have matrix m and this m8sm is defined like this and the way we are going to do is we are going to look at the eigen values of this matrix a and now you know how to find eigen values ok so so this you know actually is the equation of ellipse which looks like that it has these major X's and minor axes and they depend on these eigen values one tells you one direction tells you the maximum change other tells you the minimum change and that's what these maximum minimum eigen values are so so we did that and so now we want to essentially look at I gain values and decide that whether that point should be an interest point or not whether that should bind the corner or not okay so we started the definition that we don't look at the sum of squared differences that the window has to be different and we want to find out that that shift we do UV there that that function is minimized maximized so that we find the best location for that interest point and we found out that we can rely on these eigenvalues of that matrix M which actually have the derivative of that page I X I Y I X square I X I Y square and I X I Y so now if you look at the different you know values of lambda 1 lambda 2 these two eigenvalues then what's going to happen that if these lambda 1 MW was very small or close to 0 then that will be the flat region you know if you take a flat region the window and compute M matrix and go through all this fine eigen values you will have it will be around here okay now the corners will be around this region no so lambda 1 lambda 2 are large and there may be you know close and E will increase in all directions then we have these other regions where lambda 1 is greater than lambda 2 so that is a H it's one direction and this is a also edge we have lambda 2 is greater than lambda 1 we another direction of H so that is kind of tells you that what are these lambda 1 lambda 2 which are eigen values of the that matrix aim of providing wizard information which you can use okay so the so measure of the corner nase is in terms of lambda 1 lambda 2 so so this hair is you define this factor you know the variable R which is the lambda1 lambda2 product of them in some of them in some constant K so this is called art and the this is also another way to define our because the determinant of a matrix is essentially a product of eigenvalues okay and trace is a sum of its eigenvalues so and that's what is shown here so therefore another way to look at that instead of looking at two values lambda 1 lambda 2 we can look at this R which captures the sum and product of these lambda 1 lambda 2 so if R is small the magnitude of all small then you know it's a flat if R is positive R is negative then H and if R is positive greater than 0 some value then it is a corner so that is kind of criteria we can use there are depends on Augen values of M R is large for a corner or is negative with large magnitude for an edge in R is small for flat region yeah so that again the same question is you asked last time that you have to have some notion of what is small but large so then you will have to say normalize if are not just as 100 in terms of percentage and small as 0 so you can apply threshold it huh threshold that's right okay so everything will end up the threshold so it's better to normalize between zero and hundred then you say what's percentage you know say well I want very good corners 90% no I want lots of corners you know maybe you know 70% so let's subjective okay so so suppose this is an image there are these two images of this animal and so if you compute these apply this Harris corner detector you will get values like this and then you can apply a threshold and these will be where the R is greater than the threshold and so we'll take only points which are local Maxima you know that's another thing if there are many points on the same pixel which are greater than an odd is higher then maybe same corner buts been detected many times so you want reduce maxima separation which means if it is a local Maxima just like that it is not local Maxima don't select that okay so if the pixel is greater than snipers then the local Maxima so then you will get fewer of these and we'll have more cleaner output so now so there are different other versions of the same Harris detector and adjusted methods on this R value you know so already there's a you know tricks method which says what let's just simplify this look at the lambda 1 and lambda 2 with some constant look at how different they are what's the difference and that should be the measure there's a Liske has done harmonic mean which says take the product divided by the sum in you know use this as a metric that excuse a the author who wrote a book you can access that online and this is another one she and Tomas see yes if I just look at the first eigen value so now you can compare these different criteria and this is shown again lambda 1 and lambda 2 and Harris is this one then this is LS key and this is Chien Tomasi so it's telling the region that see suppose if you use that she and Tomas adjust it will be around this region where this is fixed on you know here and if you use the harmonic mean it's this curve shown here and then headers so you know you get a similar output but some little differences and then there's the people have compared that and this comparison but I think any one of these is reasonable so so then in nutshell if you want to write a program for computing the harris corners then steps are very simple given an image you will first compute the IX and iy the derivatives no derivative with respect X and derivative this with y and then as you saw in the aim matrix we had three times of terms you know it's a symmetric matrix we had a X square Y square in the diagonal then we have IX iy okay so we have three terms so given these derivatives at each pixel X Y value US team you compute three matrices one is access Squire basically squareback IX and iy Square and then multiply X I bar so that's and these matrices basically the images same size as image then do you want apply that window thing so the best way to apply window is just do the convolution of the Gaussian as we talked about of all those three images okay then you compute the matrix M from there then for each pixel then from there compute eigenvalues compute R and the apply threshold and so so in MATLAB it's very easy to compute the eigenvalues is e I G is a function which will give you eigenvalues and eigenvectors of a matrix okay and then you find local maximum and apply some threshold then the tetris point so that is the Harris corner detector and I think there's very nice description in this book and you can you can get this in section 4.1.1 any any questions yeah this one yeah so so see you want them at every pixel you want these you want to compute this in a matrix M matrix as you side as three terms the IX square PI Square and diagonal in I X I Y and I X I Y so you one way will be that you create three images in the first image the value should be IX square because you already computed IX X subscribe every pixel x squared another major by square third immense ixi and then use that to compute M for each pixel find eigenvalues and on this thing okay more questions go ahead yoga soon no yeah yeah so see that now what you're going to do is the you are going to compute this far all the you know all these all the pixel the image then when you compute this or the M matrix and as you saw here before let's let's go there that it is in the window so do you have to you know do this thing in a window when you compute the R matrix yeah see so you will you will do in a small window where you are applying that the Gaussian filter yeah but since your use going to use these derivatives anyways you just compute once for all the pixel then you look at you know each pics to each pixel and then you will do this kind of summation okay let's see any other question more questions so so I think the main point here is that you have to understand that you know always we have to work with images and we are trying to extract some useful information of mages in this case we want to find interest points so we have a mathematical definition what the interest point is and we explain that and then we came up with some algorithm in those steps I described to compute the interest point and it seemed all that ends up with is in terms of the derivatives of the image and the filter a Gaussian filter and there is a threshold there is the you know local maxima and there's this window operation so these are kind of key things once you understand clearly then I think you can follow that previa you have a question on so the right now we have covered just the the detector you know even though we started talking about a descriptor and we haven't talked about a descriptor yet so the I think if you are curious to try this out you know if I were you I'll just go home and write this program it's a very simple program in MATLAB since you already know how to find the derivatives given an image and you can see what you get in a pretty short program to do that you will you know will be able to verify what you have done okay so any any questions before I go to the next one today actually I will end a little earlier because I have to leave Adam eats some visitor so you'll have 15 moments any any other question comment so so far you guys are happy it's going well clear and how about the video videos is okay is useful okay so let's maybe get started with the next thing
Info
Channel: UCF CRCV
Views: 98,722
Rating: undefined out of 5
Keywords: computer vision, UCF, computer science, image, video, machine learning, vision, computer, artificial intelligence, cs, corner, interest point, detector, feature, harris, eigenvalue, eigenvector, maxima
Id: _qgKQGsuKeQ
Channel Id: undefined
Length: 47min 10sec (2830 seconds)
Published: Wed Sep 19 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.