Direct Solution for Estimating the Fundamental and Essential Matrix (Cyrill Stachniss)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so welcome everyone to today's lecture and I want to talk today about the fundamental matrix and the essential matrix and we are specially interested here in actually computing both matrices but now I'm not taking into account projection matrices or the exterior interior information but we want to estimate this from data so we assume just to have images taken from the same scene a pair of images and we want to estimate the fundamental matrix or the essential matrix that means one estimate where those cameras have been located relative to each other when they took two images so just let me know straight how that could look like so consider we have those two images here from Paris and we want to know where has the photographer beam when taking one image with respect to the other image and this is exactly a task that we aim at doing without knowing how large the object is or how the object exactly look like the only thing we assume to know is that we have corresponding points in those images so we you see those red dots over here in image number one and image number two and we assume that we know the correspondences between those points so for example that this point over here corresponds to this point and at this point over here corresponds at that point and so on and so forth so the only thing we assume to know is to have point features extracted from those images and that we know the correspondences between those point features and that's all we assume to know and then we want to be able to compute either the fundamental matrix if we work with uncalibrated cameras or the essential matrix so that we have been able to obtain the rotation matrix and the baseline vector between the two camera locations in most cases we will use calibrated camera so we are interested in estimating the essential matrix but the solution that we are going to provide is actually very similar for the fundamental matrix empty central matrix so we will start with the fundamental matrix then move over to essential matrix and then in the extract R and B out of the essential matrix so the topics today are computing the fundamental matrix only given corresponding points there in the second part computing the essential matrix so the same thing for the calibrated camera just having a set of corresponding points in those images and at last step we explain how we extract the rotation matrix and the basis or the baseline back door out of the central matrix of course only the direction not the length of that vector that's something that we cannot extract from the essential matrix which only has these five degrees of freedom so three degrees of freedom encoding the rotation and two degrees of freedom for the translation so we don't know how far the camera cameras are apart we only know the direction vector but let's start with the fundamental matrix so goal now is computing the fundamental matrix giving a set of corresponding points before we start let's have a quick look what the fundamental matrix was the fundamental matrix was defined in the following way so it consists of the calibration matrix of the first camera of the rotation matrix of the first camera and the same thing for the second camera as well as schoo symmetric matrix which encodes the base basis vector or baseline vector between the true projection centers of my cameras and it's defined for the uncalibrated camera so we do not assume our camera to be calibrated we don't have any information about the intrinsic s-- but what the fundamental matrix allows us to do it allows us to in a very elegant way formulate the COPE linearity constraint that means if we have two corresponding points M Prime and double prime so this one image one and this one from image tool with X prime transposed F X double prime equals to zero we can encode the COPE linearity constraint constraint so whenever we have two points which are the same point in reality and just created two images so one image one one and the other one image two then this product must be zero if the product is zero I cannot necessarily infer that this is a corresponding point it only works in the other if these are corresponding points and they actually correspond then this equation must hold so it's an implication only in one direction okay so let's have a look how we can compute the fundamental matrix so we can compute the fundamental matrix as we discussed so far if we know the calibration matrices if we know the rotation matrices of both cameras and if we know the sku symmetric matrix or the baseline then we know how to do that we also have seen that we can do this if we know the projection matrices so if you know the projection matrix of the first camera the second camera which encodes all this information then we can also directly provide the fundamental matrix but we want to go a different path today and see how can we do this if we don't have this information and the only information we have are the images and corresponding points in these images and our goal is to estimate F just based on those correspondences okay so let's look into our problem our problem says we have a set of n corresponding points given so X prime Y prime X double prime Y double prime these are the pixel coordinates x coordinate of the pixel y coordinate of the pixel image 1 and x coordinate and y coordinate of the pixel in image number 2 and this is the index small N and this tells us the first corresponding point second corresponding point third corresponding point so that we have the correspondences between them and what we want to get out is the fundamental matrix F and we can do this by exploiting this cop linearity constraint that we can formulate using the fundamental matrix so for every pair of points so every point in image number one and it's corresponding partner in image number two we can formulate our coke linearity constraint so the commonality constraint was X prime transpose to half X double prime must be equal to 0 so we know that this equation holds for all the pairs of points that we actually have so if I expand this expression so what does what are this actually mean this means nothing else then a three-dimensional homogeneous vector so XY and 1 XY 1 on the other side as well and our fundamental matrix which is 3 by 3 matrix so we have nine elements that we need to fill although we know it's only defined up to a scale factor bickerson homogeneous matrix and there's another constraint that we need to take into account so these elements here are our unknowns right so the unknowns that we want to estimate because this coordinate and this coordinate are my observations thought this is given and these are my unknown parameters so I have a regular function which relates my unknowns with my knowns and should be equal to 0 so again as we have seen it already for the DLT for example and standard least squares we have a vector of unknowns now we have a matrix of unknown elements so what we need to do we need to basically turn this matrix of unknowns into a vector of unknowns which we can do in a very straightforward way by just basically taking all those 9 elements and arranging them in a vector and this makes sense and we can do this in the streets actually to a linear fashion our unknowns we can actually visualize if we compute the product of the vector matrix vector so if I compute the the result of this so the first element that comes out of this will be X 1 prime f11 X double prime n so this is kind of the first element here and then the second element here will be multiplied with this element and with the first element over here which gives me this element and then I can continue this one must but the best one must part with this one and so on and so forth so I can basically expand that equation the important thing that I can see in here is that this F so my my unknowns just appear as a linear term in all those parts of the sum and all the other element X double Prime and X Prime these are all known elements so they are basically a coefficient towards the unknowns and these are all linear terms just all added up so what I can do and it was basically the same way as we have done it for the when we're competing the deal team we can put all the coefficients into one coefficient vector over here and then multiply that with our fundamental matrix so that's these also turns into vectors so we have our coefficient vector over here and then our unknown vector over here so we can simply put that together in a vector times a second vector so our coefficients and our unknowns and I'm basically getting those coefficients for every corresponding pair so every pair of points will generate me one of those equations so just in terms of limitation there's actually a standard notation for stacking this unknowns from the matrix into a vector and for actually computing that this coefficient vector which is the chronica product so if you use any mathematical toolbox or MATLAB or some other Python framework you can just use the Kronecker product in order to build this coefficient vector out and of the individual vectors that you had the kind of small vectors for the coordinate in image one image number two and this is X double prime chronica product X prime transposing this vector so this will actually give me the coefficient vector and the second part over here is something which is called vector eyes so that you can vectorize your matrix and it turns the vector or the matrix into a vector and then you can just execute the chronica product multiplying this the vectorized form of F which is exactly the product that we have been computing so you don't need to build that up by hand these coefficients you can use a standard operator in order to do this and this holds actually in general whenever you have does it's not doesn't only hold for the fundamental matrix its hold for all product of this forum that you can express this through the Kronecker product and the vectorization of your matrix and this is sometimes handy if you need to do these computations and you want to estimate those quantities so again all corresponding points generate one of those equations so for every corresponding point we get one of these equations so if we have n corresponding points we get actually n of those equations and n of those coefficient vectors so we can actually stack them together in a matrix so these are again all my coefficient vectors a just stick together in a matrix for corresponding point one or the four are the correspondences point one point two point three until point n so that I then get a matrix a and that I had the resulting equation eight times F equals zero where F is my vector of unknowns and a is my coefficient matrix and this should be able to zero and that's kind of the system that we need to solve and we have seen already a few times how that works is kind of the standard tool that we can take out of our pocket in order to solve this is the singular value decomposition so we perform a singular value decomposition of a in order to solve this homogeneous system and the singular value decomposition directly provides us a solution for my F which I then can use to reconstruct my matrix that brings this equation to zero and this is done by the what the SVD does it decomposes the matrix a into three matrices the matrix in the middle is a diagonal matrix which contains the singular values of this matrix a and if we actually have a solution so if there's a null space and a of a then one of those singular values will be 0 and we take the corresponding singular vector corresponds to the singular value of 0 which is in this case then the last one in the third matrix that we get out and this is our solution this is the singular vector corresponding to the singular value of 0 and this is my solution for f so I can just take this column or row depending how you did your an SVD out of the last of the third matrix which contains the singular vectors and then this is my solution to that system so a few words how many points do we actually need not to perform this so the vector F has nine dimensions so we have nine unknowns and we need to solve the system a F equals to zero but we know already that there that a must have rank deficiency and a can have at most rank eight we have our nine elements in here and then the number of points that we have so if we if we have a solution here then the matrix a has at most rank eight so we need eight corresponding points because we need to generate here eight rows in order to get that rank 8 in reality however the world is slightly different in reality our measurements are noisy so with more than eight points the matrix a will become a regular matrix and not single anymore and that's something that we want so we want to have the a matrix which rank 8 otherwise we couldn't solve a F equals to 0 again also as we have done that in the past when you have this type of estimation problems we were simply then performing that as a minimization problem so we are not bringing a times F equals to zero we are minimizing this expression under the constraint that for example F is norm one and then this we can use exactly the same approach of the SVD and simply now take the smallest singular value so it's not zero anymore but it's a small value and take the corresponding singular vector because this is one which now minimizes my equation so in terms of instead of using the F which brings a F equals to zero we select the F had an estimate of F which brings which minimizes a F and then we can compute approximation of F of my fundamental matrix using the singular vector corresponding to the smallest singular value okay so just again as an illustration if I do my SVD a is u times D times the transpose so this is my diagonal diagonal matrix and these are my rotation matrix and matrices and V contains the singular vectors of the matrix a so this is kind of my original matrix a it's a nine by nine matrix so nine here and down here used n by n matrix D is an N by nine matrix we have here the diagonal elements containing the singular values and there are nine of them and then we have a nine by nine matrix which is this matrix V transposed where I take out the the corresponding singular vector so the singular value is here on this diagonal they are actually sorted so it's what it was traded here and so I take the last one because it's a smallest one and then this is in this case the ninth element so d99 and then the last row here of V transposed gives me the singular vector and depending if you are if your Mathematica toolbox returns your V transpose then it is here the last row or if it is V then it is of course the last column and this is your estimate of your fundamental of the entrance of your fundamental matrix but it's not the one which brings it actually to zero but is the one which is close to zero so let's see there's only an approximation what happens if we try to bring this into an algorithm and just kind of sketch this here in MATLAB so we have our function where I want to compute F from my point pairs so I've an X prime X double prime as years thought in this variables and the first thing we need to do we need to build up our matrix a and we can very easily do this by just iterating over all points that I have some iterating over all corresponding points and simply build up this matrix a and I just do this because matrix the matrix a contains the coefficient vectors which I can easily using the Kronecker product here with the function cron which built the Kronecker product of X double Prime and X Prime and so built up my matrix a so just with this tiny small for loop I've already built up my matrix a then I need to perform my singular value decomposition so I do you the V equals SVD of a and then I simply take the matrix V and take out the last column of V 2 I do here the ninth column of V and I kind of reshape it back into a matrix into the fundamental matrix and then F is my fundamental matrix and I'm done so this is kind of the singular value a versus a single vector which has a corresponds to smallest singular value the problem that I will have however here it's only an approximation of my fundamental matrix right because it was not solving a F equals to 0 so as a result of this this matrix F that I have generated is not necessarily a rank two matrix which I expect the fundamental matrix to be right so maybe this matrix F that I actually computed here doesn't has rank deficiency and thus it has no null space and again as you have learned the null space of this matrix F is something and important with respect to the epipolar geometry for example so what we now want to do is we want to actually change this matrix F a little bit as little as possible but make sure it has a rank deficiency right so we want to enforce the rank two constraint in this matrix F so I want to manipulate this matrix F so that after this manipulation it is as close but as possible to the matrix F that we have computed but that this has a rank two constraint and I can do this again by using the singular value decomposition so remember the singular value decomposition generates this decomposition where on the diagonal of this matrix D in the middle I have the the singular values and they are sorted in descending order so the largest one second largest one and the smallest one over here and so what I Bank do is I can take the smallest singular vector value and actually set it to zero and this is an approximation of F which enforces that we have the rank deficient that we have a rank of two of this three by three matrix because I'm just setting the smallest singular value to zero and such kind of generate the ranked efficiently in the area where the where this actually has the smallest an extent so to say so what we do is we compute a second singular value decomposition but now not of my matrix a but I compute the singular value decomposition of my matrix F so it again leads me to you the V transposed and what I'm doing now I'm building up a new matrix F we are take the diagonal matrix here and put this last d3 three so the third thing you Laval you the smallest one I put it explicitly to zero typically it's a very small value but I'm making explicitly zero so that this matrix will be of Rank 2 right because those two values are larger than zero this one is zero so this will lead me to rank two matrix and then I just build up my matrix F again from taking U and V that I got through the singular value decomposition that just changed the diagonal matrix in the middle by setting the last and smallest eigen value to zero so now I can continue with my algorithms so FA is my a kind of approximation of a this was the where we stopped actually before and then I'm doing another singular value decomposition of this approximation of F now decomposing this approximation of F that I have computed in Europe you ad a and V a so this is a decomposition of this matrix and the F that I've computed and then I'm generating my new fundamental matrix basically re stacking it together from these three vectors but what i'm doing here i'm setting the smallest eigen value similarly oh sorry to zero which you see the last element so by if with this zero I'm generating matrix which is of rank two so as a result of this this is my best approximation of the previously computed fundamental matrix which makes you make sure that the result has the rank two constraint that the fundamental matrix must have in order to be a proper fundamental matrix so and will this algorithm I can compute the fundamental matrix based on eight corresponding points and this gave this algorithm the popular popular name eight-point algorithm and it's still today very easy to implement and very popular algorithm for estimating the fundamental matrix from eight corresponding points and practice however have to take care of as small of a few small details if I actually do this and uses in real world system and that is the fact that my system may not be well conditioned it depends on how the points actually spread and how my coordinate system looks like so if I'm working in a pixel coordinate system let's say I have a 12 megapixel cameras of 4000 pixels by 3000 pixels over here and then I have my coordinate system which sits somewhere over here and I have some points which have high pixel values like 1800 1400 which would be sitting somewhere over here and then I have a third of the third dimension here which comes from the normalization of the homogeneous coordinates which takes a very small value so I have very big differences in the individual dimensions and this can lead to numerical problems on American instabilities if we solved this system therefore it is better to actually change the coordinate system that I'm using they are represented they represent my my points in in a system so that all the dimensions roughly take the same values so and what I can simply do I can basically simply scale this image to a new coordinate system so that this coordinate would be 1 1 1 so basically scaling the X&Y coordinate so that I get values appropriately here so this would be in this case it's actually the same scaling for x and y so that two thousand pixels correspond to a value of one so I would get the coordinate points your 0.9 0.7 and 1 and if I just move this move to this new coordinate system where I'm just kind of scaling the x and y coordinates I get a system which would be better conditioned and then I will have less numerical issues over here so this normalization of the these point coordinates substantially improves the stability of the system and we basically make the transformation that the center of mass of all the points that I'm looking into sits in 0 0 and then scales the X Y values so that the coordinates are within minus 1 and 1 and as a result of this because I'm adding this additional dimension from the homogeneous coordinates which is one the normalization constant all the other values should be in that order so if I do this I can very simply express this with a geometric transformation which there's T times X so this more regional coordinate leads me to a new coordinate which is the headed one so that the coordinates are 0 centered and in the size of minus 1 and 1 so depending on of the points are spread in my image I'm doing this normalization that and they are all appropriately appropriately normalized between minus 1 and 1 and then I can do exactly the same step as I have done before so determining the fundamental matrix but what are then need if I after I computed the fundamental matrix this fundamental matrix is in this different coordinate system actually need to transfer it back what we can see is that if we have our original points x prime transpose f x double prime and we transform our points i basically have this transformation inverse sitting over here so that if i'm kind of executing this was my new coordinates I can so expand that from transposition operator here I basically have my transformed coordinates sitting here and here and then a combination way I have this transformation matrix the fundamental matrix and the transformation matrix so this means I can transfer Fortin back from this normalized or it's called normalized fundamental matrix by exploiting my my transformation can do it for them back so just kind of here to show if you do this transformation you to make sure you afterwards transform your fundamental matrix so that you kind of undo that scaling that you did before in here and kind of the shift of the of the pixel coordinates so it's nothing nothing very special about something that you that's highly recommended if you do this - what numerical instabilities when solving your system there are few more things we need to take care of if we use the eight point algorithm in order to compute a solution the reason why we need to take care is that there are certain configurations which we cannot handle with a logarithm so there are singularities that we obtain for example similar to the computation of the DLT if all points lie on a plane so if all points lie on a plane is very similar to computing the TLT we get the result that the rank of this matrix a may be smaller than eight and if the rank of a is smaller than eight we don't get an appropriate solution over here so we get you can see this example so these are images from this famous fundamental matrix song where you have corresponding points on the Harbour Bridge in sydney and you if you look on top of this you can see they perfectly lie on a plane and will actually generate 2d generated solutions so even if your points are not perfectly on a plane but even close to a plane you get numerically in stable solutions and this doesn't give you the result that you want to have there's a second thing we need to take care if we want to compute the relative orientation or the fundamental matrix and using this algorithm and this is if we have no translation only rotation so if you only rotate our camera on the spot and we do not translate the camera and then we may also run into issues so if you only rotate our camera this means the projection centres are both the same so let's see what happens so I have let's say these are my scene points and this is the camera image of the image or the field of view that camera one generates and this is the field of view that the camera two generates then the translation between both of the images is zero and as a result of this I also will have a singularity that the that the the the fundamental matrix cannot be reconstructed in this case of zero translation so it's also something that can be something tricky I mean we need to take care of if you want to track the motion of the camera and the camera is moving still or is only rotating on the spot then the eight point algorithm will not be your friend and you need to make sure that you capture those situations explicitly and estimate them the rotation for example separately not using the eight pro diagram so to sum up this half an hour that we have talked about the computing a direct solution for the fundamental matrix what we have done we have provided a solution which uses the cop linearity constraint and that allows us to estimate the fundamental matrix if we have at least eight corresponding points and there is a direct solution that means we don't need to iterate and we don't need any initial guess you can just put our corresponding points in and the eight point algorithm will rotate us rotate obtain us the fundamental matrix and we have basically done this by setting up a system of linear equations or homogeneous system of linear equations solving is through SVD and we additionally need to ensure that the resulting matrix that we get heads of rank two which we again you did by exploiting the fundamental matrix and again if he applies this to practical situations we need to make sure the coordinates that we have are roughly in the same area in the same size or something therefore we rescale them to four minus one to one so it all coordinates of Mohammed genius vector are approximatively of the same size so we get mu numerically better solutions in addition to that there are similarities like no translation only rotations or if all points lie on plane so these are situations where we don't obtain the solution for the fundamental matrix but with this approach in this algorithm that we have generated you are able to implement your own estimation of the fundamental matrix the next thing I want to do is look how does this change if we go to the essential matrix so again the essential matrix is basically the fundamental matrix for calibrated cameras so so far we assume we don't know anything about our cameras in terms of the calibration matrix but that's not true in reality in most situations when we do geometric reconstructions we actually calibrate our cameras beforehand because it simplifies our problems so we now assume we work with calibrated cameras from now on and so the essential matrix is then the fundamental matrix for the calibrated camera it's easy more easily written down because the calibration matrices are gone they are not part of this matrix anymore because we assume all cameras to be calibrated that means in the journal from the essential matrix is our the rotation matrix of the first camera the screw symmetric matrix encoding the baseline vector and the rotation matrix of the second camera transposed and in the general parameterization them off depending images and this is a parameterization that we are typically using it even simplifies further because then the rotation matrix of the first camera is the identity matrix because the coordinate system is defined based on the first camera and then we only have the rotation matrix of the second camera which is the only rotation matrix which is then in here and describing how the second camera works and we can use the essential matrix to formulate our Copan arity constrained it has exactly the same form as for the fundamental matrix the only difference is that those points here are now in the camera coordinate system so all the calibration parameters or the calibration matrix has already been applied to those coordinates when they're coming from the from my chip from the sensor itself so we we assumed to work here in the camera coordinate system therefore we don't need to take the the calibration matrices into account you can see already that the form here is more or less the same myths for the fundamental matrix so you can basically apply the same algorithm or very very similar algorithm that we used for estimating the fundamental matrix to also estimate the essential matrix so we try to compute the central matrix from 8 or more corresponding points so the same as before we can use our crop linearity constraint in order to get exactly the same coordinates the 8 the same equations and the only thing we need to we need to take care of now that these coordinates here are in the camera coordinate system so we assume the camera calibration matrix already being applied and we obtained this by just taking the inverse of the calibration matrix and multiplying it to the pixel coordinates that we get in our image this gives us our pics the coordinate in the camera coordinate system and then we only work in this camera coordinate system so everything what happens from now on isn't the camera coordinate system and then we don't have to take you have to care about our calibration matrix anymore so also when we then estimate the elements of e the degrees of freedom are reduced to 5 degrees of freedom because we only have the rotation matrix in the direction of the basis vector in there the calibration parameters are not available anymore and then exactly the same happens as for the fundamental matrix we again have exactly the same form that sits over here the only difference is here we don't have the 1 in here we have even the the camera constant sitting in here and then we are we are setting this equation to 0 and we compute our our algorithm or eight point algorithm exactly in the same way as we did that for the for the fundamental matrix and we only do the first part over here until we've compute the first approximation of e and this consists of again building the matrix a it's perfectly identical doing the singular value decomposition the only difference is now it's not the variable F it's a variable e because it's the essential matrix but it's obviously exactly the same except of rename the renaming the variable and we built our matrix II and only the last lines will change now because of course the essential matrix has different constraints compared to the fundamental matrix so the key question is which constrains to consider here so for the fundamental matrix we said we said the smallest eigen value a singular value to zero and we do something that's something we also need to do for the essential matrix because also it's a rank two matrix what we have to do more so for the fundamental matrix we created the set up where says F is u times D times V transpose with the two diagonal elements in here and for the essential matrix we have exactly the same set up only that we have here the same singular values and given that the homogeneous system we can actually set them to one because it's only defined up to a scale factor so this is the constraint that we exploit in here in order to put the essential matrix in the form that it takes the constraints that need to be taking into account for an essential matrix and this then allows us to put this constraint into our eight point algorithm so the eight point algorithm for computing the essential matrix looks until here until here actually exactly the same as for the fundamental matrix so we build our matrix a our coefficient matrix we solve a e equals to zero using the SVD and by building our approximation of the essential matrix the same way as we did before the fundamental matrix we then compute again a singular value decomposition of this approximated essential matrix and then build it up again by exploiting the constraints that must hold for the essential matrix that means that the first two eigen values singular values are 1 and the last one is 0 to enforce the rank deficiency and these are built from the constraints that must hold for the essential matrix so this is the only difference this last line actually only these two blocks over here which changes the eight-point algorithm of the fundamental matrix into the eight-point algorithm for the essential matrix so it's very very similar and also the conditioning we have to do in exactly the same way as we did that for the fundamental matrix so we also need to change our coordinates at the point coordinates so that the points align for example between minus 1 and 1 and that the points are centered in 0 0 so it's basically shift and a translation so it's a it's a shift in a scaling of the pixel coordinates nothing else and then exactly in the same way as we have done this for the fundamental matrix we have to apply with transformation to the essential matrix so that we can map forth and back from these things so we're basically estimating this essential matrix is Yi hat and then have to apply the transformation to turn the estimated essential matrix into the essential matrix that actually works with the real image coordinates that we have so this just kind of step that we need to transform the points before and need to transform the resulting essential matrix back and that's basically it there's nothing more to that so a few words about the properties of the essential matrix so we've seen it in homogeneous matrix it's a matrix which has a determinate of 0 is Rank 2 similar there to the fundamental matrix and it has two identical nonzero singular values these were the ones we had set to 1 we can even set them to 1 because it's only defined up to a scaling constraint and then there exist other constraints that as a result of the skew symmetric matrix that is involved that the the central matrix must fulfill this equation and so this has not been explicitly explained in here but this is related to this constraint here that we used when setting up the fundamental matrix not going to the details you find in both conference book the duration for that if you're interested in that how that is related but I decided it's fine with us that we know that we have this constraint over here and enforce it when computing the essential matrix so we are now have also used eight points for computing the essential matrix but the essential matrix has a smaller degrees of freedom so the fundamental ratio at seven the the essential matrix has five parameters that we need to fulfill so in theory there should be an algorithm which requires less points so for the fundamental matrix there should be algorithm which requires only seven points and actually that is the case there's a seven point algorithm but it's slightly more complicated and then the eight point algorithm and for the essential matrix even more extreme because we only have five parameters that for the for encoding the the parameters of the relative orientation so there is actually a five point algorithm and going on from eight to five points can have substantial edge advantages especially if you compare this approach with rancic if you need to reject outliers because then the number of points you need to sample gets much smaller so there is an algorithm and so-called five-point algorithm or Nestor's five-point algorithm that requires only the minimum amount of five points in order to compute the relative orientation so it's an algorithm that has been proposed in 2003-2004 by mr and it became the standard solution today for computing the relative orientation for the calibrated camera so basically all approaches that estimate the translation and rotation of from single camera images will internally use misters 5 point algorithm in order to compute the relative orientation it boils down to basically solving a polynomial of degree ten so you end up with having ten possible solutions and it's it's somewhat more complicated to do again I recommend you the photogrammetry to script from both come first no or the book or also misters work on the five point algorithm if you want to dive more into the details but it's kind of one or two pages of their relations how you come up with your polynomial of degree ten how you were actually solving or using that why is it so important again to have five points instead of eight points the reason is if you have outliers in your data and if you just compute your correspondence ease without any further information maybe just based on your feature descriptors you will make errors near the Association and having errors in the data Association which actually lead to wrong estimates of e so what you're doing you're Sam you're using R and second you're sampling five correspondences assume them to be correct compute the essential matrix out of it and then take all the other points and see if they agree with the transformation you've computed for example using the COPE linearity constraint and see if they actually brings all the other correspondences to 0 or you count those which you bring to zero or close to zero and then you basically repeat this process but the problem is you need to sample those five points in the beginning which are in liars and it's much harder to sample eight points and all of them are in liars then sampling five data points and all of them are in liars so as the result of them the smaller they the number of points that you need to sample as outlier free in rennes ik the more efficient that approach is and therefore it is important to have this five point algorithm because it is inside use inside rancic a lot and so you can reduce the number of trials you need forensic substantially if you have only five point that you need a sample rather than eight points that you need to sample and again here for the references if you want to dive more into the details of the five point algorithm standard implementations for the five point algorithms are for example available in the opencv library we can just use the five-point algorithm by mr and get your relative orientation out of five corresponding points so now that we have computed the fundamental matrix at the beginning or now the essential matrix assume we stay with our calibrated camera what we want to do is we want to actually get out the rotation matrix in the basis vector of my essential matrix because typically we want to estimate the trajectory that the camera takes so we actually want to know how did the camera actually move so what we need to do we need to compute the actual orientation parameters out of this matrix so we need to take E and turn it into a rotation matrix and into our baseline back door of course based on vector can only be determined up to scale so typically in unit length 1 but that's what we want to do now so computing the orientation parameters out of Anna gave an essential matrix so in short we have E and we want to obtain the screw symmetric matrix of the baseline or in the baseline directly and the rotation matrix and the first question we need to answer ask ourselves is this actually possible and if they're actually unique solutions or are there more than one solution and if I posed the question like this is there a unique solution then it's rather obvious that there is some other solutions but these solutions as we will see which are physically impossible but which which which can be explained geometrically but which are not physically plausible given the cameras ever typically use and therefore we can actually reduce them to one solution but we have to put a bit of brainpower into that in order to do that so what we have is we have our camera number one we have our camera number two we have our direction vectors and we know our points intersect the race must intersect somewhere because here is one of those corresponding points so there's an example for one corresponding points generated from the two camera images and that's the solution that we want right so the point lies nicely in front of our camera and and that's where the cameras are serving the point so that's exerts the solution that we actually want to have so this baseline vector B and the rotation how camera two is rotated with respect to camera one let's result that we want to get from a mathematical point of view however there are more than this physically plausible solution and the reason for this is we know B only up to a scalar right so only up to a scale factor they so see how much news property so let's multiplied with minus one if you multiply that was minus one it basically flips the vector into the other direction that's fine but this simply leads to Direction vectors to the two the vectors up to the pixel coordinates which are different because then I need to swap the two cameras so as a result of this the Rays port in this direction and if I extend those rays I will actually see that the intersection of those two rays happens behind the camera so it's basically if I'm looking forward like towards you with two cameras it would be that of course the intersection of the corresponding point is behind me which is something which physically doesn't make sense because the camera is looking forward but from a mathematical point of view how we have modeled our camera this is a possible solution the other solution that we can do is we can actually take this setup that we have here and we can actually simply rotate the second camera by PI or letting this vector actually point to the opposite direction which is equivalent or so it so basically can be pressed as a rotation around here so if I do this then or flip the second one down here I get intersection which is here further far away and that one camera is basically looking backwards or I can actually do both and then I again have the intersection of the points behind me so in the end there's only one solution that I have the one solution from physics that I get out here that both cameras are looking into the forward direction and the intersecting point is there section is in front of both cameras and that's actually what I want to have not that one point is in front of one camera and behind the other camera or that the point is even behind both cameras which is something which which doesn't make sense from the physics point of view the question is we want to have that solution how do we obtain that how do we identify that the four from the four solution that I get out from the math from the math point of view which one is actually the physically plausible solution it's something that I need to do so we can actually come up with an algebraic solution and for actually generating those for those four setups that we have just described here and then these are basically four essential matrices that I can compute and then I need to check for which essential matrix the point actualize in front of camera and where lies in the back of the camera okay so let's start there's the solution where Harlan says I describe the hardly a sermon so we have our essential matrix which is U which is a rotation matrix we have our diagonal matrix which has 1 1 0 on the main diagonal and my second rotation matrix which one's post and from the SVD I know that these two are rotation matrices right so what I can now do I just define two matrices again these are kind of god-given matrices this is a matrix zette and a matrix w which I just write down it's not that you derive them it's just kind of if you know those matrices you will see that we actually get our solutions so the one matrix the matrix there is a screw symmetric may excuse a metric matrix so which has just here this element 1 and minus 1 everything else is 0 and I have a rotation matrix which has minus 1 1 here minus 1 here and B and the one over here so it's a rotation around the and that axis ok so if I multiply this matrix that with this matrix W I get a matrix out which has 1 1 0 on the main diagonal and it's 0 everywhere else ok so the product of Z W gives me this result which is sick exactly the D matrix that I had in here in the decomposition of my essential difference okay so what I can do is I can put Z times W in here and write the essential matrix as u times Z times W times V transposed and that's an equality right so I can write E as u times Z times W times V transposed and haven't changed anything given that I know that U is a rotation matrix and I know that the rotation matrix times posed by a rotation matrix itself gives the identity I can add u transpose U and actually put it in here so between those two matrices I can actually write the identity and which I then can expand to you transposed you knowing that use rotation matrix so this can this is U times that times u transpose times u times W times V transpose so basically just adding matrices in here so that they sum up being identity matrices for x right so I haven't changed the system at all but now we can see it we can recognize that there's actually a trick has happened so what I have here in here I have the rotation matrix times a screw symmetric matrix times the rotation matrix transpose so again the inverse of the rotation matrix so this element uz u transpose will actually stay as whose skew symmetric matrix and this part over here is the rotation matrix times the rotation matrix times the rotation matrix so this also a product of three rotation matrices will stay a rotation matrix so this is a skew symmetric matrix and rotation matrix which exactly the form of the essential matrix in the general parameterization of dependent images so by performing u times that times u transpose I get my skew symmetric matrix and by multiplying u times W times V transpose I actually obtain my rotation matrix and so the good thing is I know you I know that I know you transposed I know you I know W and early transposed u and V I got from my singular value decomposition and that and W I just defined myself so with this set up I actually get a decomposition into the SU symmetric matrix and the rotation matrix and SAS I can decompose my central matrix into the rotational matrix and into the basis and that's exactly what I wanted right so this is the first solution that I get and now what I can do is I can again look to this trick that I have done here of writing this 1 1 0 matrix by the su symmetric matrix and the rotation matrix and what I now can do is I can actually come up with four different ways for writing this down by basically computing the so in in terms instead of zet this new symmetric matrix I can transpose it and put the minus sign in front of it so the negative transposed or the su symmetric matrix is a skew symmetric matrix itself so nothing changes in here so or what I can do is I can transpose the the the overall matrix so I can know so here I can compute the negative of the sku symmetric matrix and transpose my rotation matrix performing the rotation into the other direction and the transposition of this crew symmetric matrix and the matrix W and when I execute those operations here if you sit down and do it yourself by hand you will see that you always end up with the matrix 1 1 0 and so these are four different ways how I can generate the same matrix which gives 1 1 0 in a product but I get four different results for the rotation matrix and the skew symmetric matrix if I do the decomposition so again I can this is what we have done so far and I can do the same for this solution for this solution and for this solution so as a result of this I will simply get multiple variants or two variants for this screw symmetric matrix and two variants for the rotation matrix and actually specified down here so one is you that you transposed or you z transposed you transposed so in this example the skew symmetric matrix is transposed and for the rotation matrix I get u WB transposed and that before or u times W transpose times V transposed and by now taking all those combinations that I can get actually get my four solutions out he one was the solution I was describing before and then I've eetu III and II for those where parts of the you or that matrix are actually transposed and this is my four solutions so I get my four solutions for the essential matrix I can we compute them then I compute my rotation matrix and my skew symmetric matrix and then I need to check at the point in front of the camera or behind the camera and so in the solution policies around says yes we can actually do this compute the singular value decomposition of e so that I get you the envy what do you name they need to do you need actually to normalize them to make sure they're pointing to the right rate so they depending on how your SVD is implemented you're basically normalizing V and U and then you are you can construct your four solutions you that you transposed use a transpose u transpose and as we had it before then you build up you try all four combinations or all four solutions that you get out and basically test extract the rotation matrix extract the two symmetric made through symmetric matrix and see in which configuration are the points in front of the camera and in which situations are the point behind the camera and then only one solution should survive which is the only physically possible solution okay so with this you what you have to do is you have to resolve the ambiguities that resulted from the homogeneous setup that we have coming up with this four solution that the baseline vector can be flipped I can do this a rotation of one of the cameras and then looking backwards this is just because we can scale our our well our vectors with minus one and then they're actually pointing into the other direction and but with this setup that I've shown over here I'm actually able to determine in which configuration I am in and then come up with my solution so to sum this whole lecture up what we have done is we have provided algorithms to compute the relative orientation just purely from image data so we do not need to know knowledge about the scene in the sense that we know the three locations of point in the scene as we may need to know that for States or sectioning or for computing video TV where we need to know 3d points in the world this is not the case here here we are only working on correspondence these it means the information that one pixel image number two has a corresponding pixel in the other image and this allows us to um estimate the camera motion except of the scale so if I have a calibrated camera just through the secrets of correspondences that are compute from image 1 to 2 2 to 3 3 to 4 and so on I can actually estimate the movement of my camera but only up to a scale factor so I'm not sure how long the baseline vector is only know the direction of the baseline vector and the rotation of the camera but this is the basis for something that we call visual odometry where we want to estimate the movement of the camera in the 3d world and either add some scene knowledge or as some other sensor like an IMU or maybe a GPS sense or or some other centers which provides me with this scale information and then I'm actually able to estimate the movement of the camera actually in our physical world with all six degrees of freedom at the rigid body transformation the solution that we presented here have been direct solutions so for the fundamental matrix we require eight or more points the so-called eight point algorithm for the fundament for the fundamental matrix yes for the essential matrix we used exactly the same algorithm the so-called eight point algorithm we just have to define a different constraint that holds for the essential matrix for the two larger or identical singular values and then we have very briefly sketched or I basically told you about the existence of the so called misters five point algorithm where you can do that with five points and which is kind of the gold standard approach today for coming up with the relative orientation of the calibrated camera so again all the solutions we presented here are direct solutions so that means they don't require initial guess but they are statistically not optimal in the sense that they for example do not take and the uncertainty ellipsis into account so how certain you are in localizing a point for example but what we can do is we can actually extract the baseline vector and the rotation matrix out of the essential matrix using the solution between system and that we have just presented and what happens these direct solutions are often used in combination with the rancic algorithm for handling outliers so for splitting up our correspond set into an inlay aset in into an outlier set and then we forget about the outliers and work with our entires and those in Liars are then used in the result of the of the direct solution which serves as an initial guess for further computations and the further computations could for example being least squares approach which only uses in Liars then may take uncertainty information into account in order to come up with an even better estimate of the rotation matrix in the translation matrix and maybe even uncertainties which are associated to that so that I know how precisely I know my individual parameters and I can for example weigh certain points more if I'm more certain about the measurement process of obtaining those points these are all things that I can add in a subsequent least squares approach but the direct approach that I've presented here has its two key things first it provides the initial guess that the least squares approach requires so badly and the second one is that it allows me in combination with rancic to separate the points that into in liars and outliers and this is the key basis for also making the least squares approach converge to the right solution that we have an outlier free point set in here that way and therefore these direct solutions are the basis for computing the so called visual odometry which is kind of the chained-up orientations of my camera and this way describing the motion of the camera into again if you want to dive further the photogrammetry computer vision book chapters twelve point three would here be the starting point for diving into the fundamental matrix and essential matrix and I also record would recommend the reading by Hartley in defense of the eight point algorithm way points out that the eight point algorithm has certain strengths and advantages over other algorithms requiring less points although we need to have eight corresponding points or if you look into the five point algorithm this is here the work Bannister's group where you are actually able to estimate the relative orientation of the camera just from five corresponding points which is today kind of the gold standard approach for doing these tissues and with this I hope I gave you an idea how we can use camera data to estimate the motion of the camera just by taking pairs of images that have been recorded basically assuming a free-floating camera through the environment which allows us to estimate wares for example mobile robot or an autonomous car equipped with cameras in the next point in time and we can figure this out up to a scale we may need to add the scale information from GPS IMU Adamo tree information or whatever we have which then allows us to actually put the scale into place Willis I thank you very much for your attention and I hope you learn now how you can also use cameras to actually estimate information about the 3d world and the motion of your popcorn thank you very much
Info
Channel: Cyrill Stachniss
Views: 7,947
Rating: undefined out of 5
Keywords: robotics, photogrammetry
Id: zX5NeY-GTO0
Channel Id: undefined
Length: 62min 46sec (3766 seconds)
Published: Tue Apr 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.