The Numerics of Bundle Adjustment (Cyrill Stachniss)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everyone it is my great pleasure to welcome you here to bundle adjustment part two today we want to look into the numerics of bundle adjustment so i assume to have attended the first part of the bundle adjustment lecture and now we want to look into the numerical tricks that we can apply in order to solve the system of linear equations that i need to solve in the context of the least squares error minimization that happens during the bundle adjustment problem before we want to dive into those numerics just give me three to four minutes to rehearse um what we have what i talked about in the previous lecture on button adjustment so that we are all on the same place um so button adjustment happens for example if i fly with a uav over an environment i take images and i'm going to build a map of the environment indicated here so these are the different image planes virtually displayed over the ground and what you see on the ground this point cloud are 3d points with their 3d locations computed from those images and that's exactly the problem that we are tackling here so you can also have this in kind of more illustration form these are the different locations of the aircraft or the uav on the ground and you have certain points on the ground that you're seeing observing from multiple camera images as well as some of those control points for which you assume to know the 3d coordinates and the question is now how can we determine the locations of the camera as well as the locations of the points of the world so a three degree of freedom vector we need to fill for every point in the world and the six degree of freedom backdoor for every camera taking into account its position and orientation maybe even more parameters if we take calibration parameters into account as well and so what the button adjustment problem does it tries to solve this task in a least squares fashion that means we are setting up um the stress problem which leads us to a system of linear equations that we need to solve and this provides us with a new estimate of our camera orientations and our point coordinates and as is a non-linear problem it's something that we need to iterate so the key idea is we start with an initial guess and then we take the 3d points in the world in our current estimate and project them into the images or where we think the cameras are and compute basically a virtual image so to say and then we basically minimize the arrow between the actual image and this virtual image or at least the reprojection error so where is the point reprojected to in my virtual image and where the point actually measured and this discrepancy is what i'm trying to minimize in a least squares fashion so we can actually set that up in the following form we have here the coordinates of uh of a point in an image so i j i the index of the point j is the index of the image so this is the 2d pixel location of a point in an image then we have our corrections applied to it and then here on the right hand side of the equation we have a scale parameter resulting from the fact that we are living in a homogeneous world over here so every equality is only defined up to a scale factor and then what we have over here is the 3d coordinate of the point in the world so point i and the projection matrix of camera j that projects the point i in camera image j to the image plane and this matrix involves a couple of non-linear parameters the non-linear calibration parameters as well as the orientation parameters that need to be taken into account over here so we have scale factor our projection matrix our 3d point and the projection matrix is has several parameters so these are the projection parameters so the extrinsics and the intrinsics as well as some let's say non-linear distortion parameters i need to take into account and this may be even pixel dependent so depending where a point is projected to in the image i need to apply a different correction and this all happens under an uncertainty in the image coordinates so this assumption how precisely can i pin down the points in my image and what this equation actually does it assumes known data cessation something that may not be directly clear if you see that for the first time but given that is the the point i is given over here so the index of the point i and i have the index of this point i in the image as well and i assume to know have all the ids of the images of the point in the image and the point in the real world that means i have a known data association so i know which point is projected to which pixel location and there's no ambiguity data association and um so our unknowns that we need to tackle in here are the coordinates of our 3d points in the world so our new points the points which are not control points we have one the scale factor which happens for every point and every camera image so the huge number of those parameters over here that's we have the six degree of freedom exterior orientation five degree of freedom interior orientation plus some non-linear parameters and so these are this can lead to a potentially large number of parameters and especially the problem is the scale factor over here this is the the factor which occurs most often and it's actually a parameter i'm not really interested in interested anymore because this factor doesn't tell me anything in my euclidean world because it's only something that exists because i'm operating in homogeneous coordinates and the result of this is one of the few cases where it makes sense to go back from the homogeneous world into the euclidean world and do this minimization in euclidean parameters and i can actually achieve this by taking the last row of this um vector matrix multiplication um which is the homogeneous normalization so i take the first two rows and divided by the third row this is then the euclidean normalization so going back from the homogeneous world into the euclidean world and as a result of this everything happens in euclidean world in the example that we had discussed last time in the lecture this helped us to go down from 13 million unknowns to 3 million unknowns just as an example and if i now have this equation then i can use my standard least squares approach the standard procedure that we know so we have our unknowns and our observations we set up our normal equation system so the coefficient matrix or jacobian matrix a this is the covariance matrix or the inverse of it so the information matrix my unknowns and here my observations and this leads me to the update of my parameters my unknowns over here given this form and this is basically what i need to do and in order to solve this system of linear equations um i need to solve this least squares problem i need to solve the system of linear equations and the problem with this is that this can be a large system as we have seen last time and this brings us to the question how can actually solve this and this is where this lecture basically starts the part two of the bundle adjustment lecture looking to the numerics of the bundle adjustment problem so why is it challenging to solve that such a system and um what information or what effects can we actually exploit when solving the system in order to do this in an efficient manner um so what we have seen also last an example but i can actually repeat that in a second we cannot solve the system of linear equation in a straightforward manner that pops out of bundle adjustment problems of a typical size why is this the case the reason for this is that the resulting systems are huge so we have a large number of unknown parameters and a large number of observations and the numbers typically too big in order to solve this at least in a reasonable amount of time on my standard computer if i don't use any tricks so just as a kind of small example how they could look like so assume we have 20 000 images that we have recorded we have 18 points per image so it's a rather small number of points compared to what today's typical image feature extractor such as shift features would produce and we assume that we see every point three times so we have three observations per point so every point pops up on average three times an image so this means we have 120 000 points overall for the prime for which we want to estimate their 3d location so 120 000 multiplied by 3 because it's an xyz coordinate gives us 360 000 location parameters we also have camera orientations we have 20 000 camera images every camera has an uh an exterior orientation which has three degrees of freedom so and y that your pitch roll which gives us 120 000 orientation parameters 360 plus 120 gives us uh 480 000 parameters and we have 720 000 observations because we have 20 000 images and we observe every point three times um with uh and 18 points per image and we observe an x and y coordinate this gives us 720 000 observations note that in this calculation i excluded the um the intrinsics so the calibration parameters because the number is typically small it's something maybe around 10 that you may want to take into account so it doesn't really play a role in here so we have roughly half a million um unknown parameters and roughly 715 50 000 observation this will lead to a jacobian matrix with has 3.5 times 10 to the power of 11 entries and even the normal equation has 2.3 times 10 to the power of 11 entries and that's huge so if you assume that you have let's say double values then even storing the jacobian leads to something like 2.8 terabyte just storing the entries of this single matrix not even having done anything with it and um 2.8 terabytes is a huge number um that's maybe something that your computer can store on its hard drive or maybe even in memory although that's very unlikely you see that this is a big number it's probably too huge to solve this in a reasonable amount of time so we need to think about how can we actually do this how can we exploit certain properties that the bundle adjustment system has in order to lower those numbers down so this is too big to be solved in practice for practical applications so what can we do in order to understand the underlying structure of the bundle adjustment system let's see how we typically acquire image and we start here with the acquisition of images from an aerial vehicle but very similar effects also hold if you just take a camera and walk around the scene and record a scene the structure is slightly more simplified here so it's very easy to illustrate the matrix and the entries of the resulting matrices if we start with this equations therefore i'm doing this but similar structures not as symmetric but in a similar way will also cure if you just take your camera walk around the scene and record images from the scene for example or drive this vehicle through the cities and record images and want to build a 3d reconstruction of those images so not much changes in this sense if you don't have that flight pattern but the flight pattern helps us to understand the structure probably pretty good so again this is the example that we used in the end of the previous vector and bundle adjustment we have a uav that flies over the ground there seven times seventh or 49 points of which four points those in the corners are control points for which i know the orientation all the other points are new points and i'm basically flying in kind of three stripes of the scene recording seven images per stripe so three times seven 21 images so the first image over here covers six points one two 2 8 9 15 and 16 so those points and then i'm basically shifting by one point the images over the scene until the last image i take has the point c the points 6 7 13 14 20 21 and then i'm actually the airplane goes down and records those images over here and that's basically how i obtain my image so i have three stripes seven images per stripe this is here an overlap of 60 20 so a comparably small overlap for uav mapping if you want to generate high precision maps this means we have 21 images 49 points of which four are full control points these are those points in the corners where i assume to know the xy that coordinate so known point coordinates and that means we have 45 new points these are all the other points in here for which we have to estimate the 3d location of those points in the image and so that's the underlying estimation problem these are the images that we have and we want to exploit this this structure and illustrate the result the resulting matrices such as the jacobian matrix or the normal matrix based on this example over here so um what do we have we have six images with six points so this is this image the last image the one down here basically all images on the side so image one image seven image eight and so on and so forth they have six points per image and all the other images have 9 points per image so 6 times 6 and 15 times 9 points that means we have 171 image points and we have um two observations per image point so there's a one missing sorry for that copy paste mistake so two times 171 gives me 342 observations and of which my unknowns are uh 45 points that are unknown points with three parameters each edge right at xyz gives me 135 unknowns and for the orientation parameters 6 times 21 because of 21 images gives me 126 unknowns so in sum we have 261 unknown parameters and 342 observations so this is kind of a much smaller problem and this is something that we can actually still illustrate and look how the structures of the underlying matrices look like if i build up this linear system for this example in order to make that even a little bit easier we kind of change or we rearrange our variables now a bit okay we do the following we start with saying we have our observations our corrections and then we have our jacobian matrix and our delta x okay so what we can do is we now look into this vector x and the only thing we are doing we are rearranging the order of the variables in there so we could say first and we put the orientation of the uh the first camera image then the points that we see the next camera image and then the points that we see that's would probably be the s the natural way how you would do that if you implement it on your own what we are now want to do is want to kind of separate the location of the 3d points and then locate the 60 orientation parameters so we put all points coordinate first in my vector these are these delta k is the coordinates of the points and we have the delta t which is not a translation vector here t but this should be a 6d vector for every camera so the camera orientation parameters so the only thing i'm doing now i'm kind of rearranging the variables in my vectors that first i have all the 3d locations of the points and then all 60 orientations of my camera so x y that of the first point then xy that of the second point x y is that of the first point and so on and so forth and then x y that your picture roll of the first camera x y that your picture all of the second camera and so on and so forth so this is kind of the reordering of the verbals that i'm doing in my unknown vector delta x okay so if i then write down my equation with my matrix a so delta l plus my corrections is this matrix a times delta x and now i know that delta x consists of a delta k and the delta t block this means what i can do i can take my matrix a and basically cut this matrix not exactly in the middle but exactly at the point after three times delta the size of delta k i made a cut in here and that would then give me two matrices a matrix c and a matrix b okay so it's just kind of splitting our matrix a into two matrices a matrix c and a matrix b having the same entries okay so if i then multiply this matrix with this vector as a result the matrix c will be multiplied with the delta k so the point parameters and the matrix b will be multiplied with the orientation parameters so that i have c times delta k plus b times delta t okay so this is just from the reordering of the vector i can say these matrix a consists of two parts and the they probably have a different structure because one is related to the um three point accorded parameters and the other one is related to the six degree of freedom orientation parameters of my camera right so this matrix c and this matrix b will probably reveal a different structure what we're going to do we're going to investigate the structure of this matrices bc and as a result of this also a okay so what do we have so this is just basically copy paste from the previous slide we have our c b with our matrix a times k t which is our x vector gives me in this form that means we have the same thing for all the individual observations so for every point observation in every image we actually get this equation and it's a it has two parameters so because it's a two-dimensional observation i'm observing the x-coordinate and the y-coordinate of a point in an image so for every observation i j so i is the 3d point that i'm observing in j is the camera image so for example point 15 in camera image number three gives you a two dimensional backdoor of the observed x and y coordinate so as a result of this this matrix a transposed j i is a two by u matrix um and because it's two dimensional and this is the unknowns that i have so that means the cij is a two by three matrix which is multiplied with my orientation parameters because these are they are three dimensional the output must be two dimensional so this must be a two by three matrix and similar here the output must be a two dimensional matrix and this is six dimensional so b must be a two by six matrix or b transpose must be a two by six matrix okay so i know that these are kind of small blocks that i'm actually having here and the coefficient matrix will be filled with those matrices and the key thing and that's kind of an important part in here is that this matrix is 2 times u so this is all the number of unknowns that we have because this is this high dimensional unknown vector but i know that only a small number of the parameters will actually impact what i'm going to observe namely for estimating where point whatever 10 is seen in image number 2 or the other camera image locations don't matter and all the other point coordinates don't matter right so it's only the coordinate of the point itself and the orientation of the camera which observes that point so this means this u and you can be very large these are all my unknowns in my example before this would be a 240 000 dimensional vector or matrix over here so 2 times 480 thousand i know that the result will only depend on a 2x3 matrix c that maps exactly those parameters of the point k i'm actually seeing and a 2x6 matrix b which relates that to the location of my camera and the orientation of my camera so this is basically the camera image j and the point i all the other points don't matter so the important thing to note that of course this matrix c and this matrix b will be different for every point and every camera image but it means that the majority of those entries here don't matter and thus must be zero right so this is a very very very important point probably the most important point of this lecture here today is this insight that only the point i'm observing and the camera which i'm observing the point tell me something where this point will be mapped to all other points all other camera locations do not matter for that computations thus have no impact sas must have a zero coefficient and again this coefficient matrix is the jacobian of my error function and you there also becomes clear if you have a parameter that is not involved in the estimation derive with respect to that parameter that entry will be zero and that's the reason why the jacobian or the coefficient matrix here has all those zeros in there so our matrix has this structure over here so this is basically these sub matrices how the different points are seen and this leads us to matrix if we expand one of those matrices of these ij matrices will be matrix which will be basically always zeros just at the right location where the mapping goes to the point i and the camera image j there sits this matrix c transposed the rest is zero again and the b block exactly the same happens everything is zero except this single block which is related to the camera that actually observes that point to the camera j that observes the point i so as a result of this this matrix has these individual rows and those rows are basically zero for nearly everywhere and just a small number of points remain so the key insight in here is we have a sparse matrix a matrix where the majority of entries will be 0 and we just have small blocks in this matrix where the blocks are non-zero so when i was talking about a row over here of course it's not the row is not really the correct word because it is um it's a two-dimensional row right so these are two dimensions so these are vectors of dimension two and this is a two by three matrix and it's a two by six matrix so there's in every row they are basically if you take it to real row it would be uh three plus six so nine uh non-zero entries and uh this yeah nine non-zero entries and the rest would be zero and i had that basically twice um once for the x coordinate and one for the y coordinate of my observation nevertheless the key inside this matrix is sparse and has made the non-zero elements and you can see this the result for that is is because basically not in every camera image you observe every point if you would observe in every image every point over all images then this matrix would be more densely filled with values right so that's kind of a key thing but that's not the case in realistic setups in realistic setups we move through the world we move our camera through the world and we're observing new points we're not observing the same scene all the time and as a result of this in reality this matrix is sparse and this matrix has actually a very very clear structure so if this is my full matrix a broken up into the matrix c and the matrix b here i have the number of observations and here i have the number of points and the number of images that i have right so these are all the points that i have and these are all the camera images that i have and these are all the observations that i have this is the structure of the matrix of the matrix a and what you can definitely see a certain pattern over here and this pattern is so symmetric because of my flight pattern so this basically is a block that corresponds to the first stripe the second block here is a block that corresponds to the second stripe and the third block over here is the one that corresponds to the third stripe and we can see certain properties in those matrices so if you make basically a cut through this matrix we take one row or one one block in those matrices out those different blocks tell me something so the non-zero elements non-zero blocks that i have in here basically gives me the number of times a point is observed right so these are my observations these are my points so if i go down and count the non-zero elements it basically tells me how often has a point has a point been observed by camera images and again this is zero most of the time just a small number of times a point is observed with a camera image and a similar effect we can see here if we basically make a cut here through this matrix then here again we have the number of images we have the number of observations so this tells me basically how many observations do i have per image how many points do i see in uh in that image so this is the image number one two three four five six this would be the sixth image how how many points do i observe in here you can also see here that this block the first the seventh the eights and so on are smaller than the others the reason for this is because these were the points the images at the beginning and the end of a stripe where i only observed six points instead of eight points and therefore you can see that those guys are a little bit smaller than the other elements in here because there are three points less that are observed so if i cut through this matrix here this base gives me number of obser of observations per image so the number of points that i see with an x and y coordinate and if i make the cut here it gives me the number of times a point is observed overall so it's a structure this matrix has a structure which is related to the measurement process so depending how i record my data this matrix will look different and i want to now dive a little bit further investigating the sub matrices cij and bij that i want to look into so the matrix c starting with this one over here consists of this two by three sub matrices c i j transposed which are those matrices over here and this is basically the connection that we have of a point that we observe in an image so x i j prime this is the x y locate pixel location of the point image with the actually actual point that i'm observing so because this is the point itself and this is the observation so it connects the points and the three points in the world and the observations in the image with each other and we only have per row in this matrix c one non-zero two by three matrix per row so row is basically a row of size two um because one for the x coordinate one for the y coordinate and as you have seen this what we what i've shown the beginning for motivating this the sparsity constraint is that um we only have one element one c i j transposed which sits in one of those rows to one of those rows i have only one two by three block and the rest is zero and the number of non-zero two by three matrices per column this was kind of the column that i've shown before tells us how often um the point or a point x i has been observed in the overall mapping process so maybe the point would come if the point would kind of show up here again that means the airplane would have taken the second round and it's doing the mapping again or closes the loop and goes back to the beginning then we would have additional points showing up over here because the point would have been observed another time down here this is what we can see from this sub matrix c we can also investigate the properties of this matrix b over here in a very similar way so this is the second part of the matrix this was the one relating um the uh camera image the camera image and the number of observations so basically looking how many points am i actually observing in the individual camera images so it consists of this two by six matrices relating the camera orientation of the camera j um with the corresponding observations so it links the observed point x i j and the j's camera orientation so where is the camera if i change the camera of course this has an impact on where the point will be projected to in my 2d image and again we have one non-zero two by six matrix in every row so if i go through a full row of the matrix i will have one two by three blocks c i j transposed and one two by six blocks b i j transposed so basically every row has nine non-zero elements either one dimension or two dimensional if you only can take into account the x or y coordinate of course so the number of the um two by six matrices which are non-zero per column this was the cut that we have done over here basically tells us how many observations we have in the image so how many points am i actually observing okay so another question is how do these bij and cij actually look like so if you go back to the photogrammetry one course on the projective three-point algorithm we actually did some derivations in there and we have seen that for the stereo norm for the normal case um so this is an important setup here it only holds for the normal case if the camera sits in the normal case then this would be the resulting elements of your matrix so the normal case is a special case and it only holds for that normal case so if your camera is in a certain location that is specified through the normal case for the general case and that's what you would need to do if you want to implement that system those guys don't help you because your cameras are never always in the normal case you want to optimize your camera orientations the problem then is for the general case that becomes more tricky actually you can still derive it by hand but it's more demanding you will write down several um pages full of derivations so that something that you will for sure make mistakes in practice is a very demanding and daunting procedure and therefore in practice we always use mathematical toolboxes which actually do those computations for us so there are two common ways once you compute your jacobians analytically this is kind of the standard approach there are toolboxes out there where you can specify the equations so that means how your point is mapped through the projection matrix into the camera image and you compute the first derivatives all the partial derivatives of this which forms here jacobian and it basically generates automatically code that you can put into your whatever c or c plus plus program and you get the computations out automatically to write the mistakes that you're doing while typing those equations down so this is then specific for your implementation another alternative which also becomes more popular over the recent years is actually compute the jacobians numerically um we don't have to do those derivations um then by hand or put them in wire program they are toolboxes or solvers out there which actually compute the jacobians numerically that is also possible but kind of the way i would commit to here for that lecture would definitely be the way for computing them analytically that means we write down our equation and use our toolbox say compute the first derivative with respect to all the unknown parameters um and then you'll automatically get code generated which does this for you so we don't specify them here manually because they would fill several slides for the derivations if you would need to do that by hand so i assume we have a way for obtaining these cijs and the bij matrices over here so once we have specified that so once we have our bij and cij matrices the next thing is how do we build up our normal matrix or for our normal equation system so again the important insight in here that this is a matrix which is sparse why is this the case because our jacobians are sparse so this was my a matrix and i'm turning into my normal matrix by computing a transposed the uncertainty of my observations inverted so the information matrix of my observations and my jacobian as well and so this will give me a matrix in the dimensions of the unknowns that i have so number of unknowns number of unknowns which will have this form and again through the arrangement of my unknown vector into 3d points followed by orientation parameters of the camera we can also see a certain pattern showing up here so these matrix consists of four blocks so these blocks they are the same blocks okay swapped of course and then we have the the matrix blocks here which are the interesting or interesting parts as well this is the one relating the camera orientation the point orientation so these are all three by three blocks so this is basically diagonal block matrix which compare contains of three by three blocks in this information matrix okay and this matrix down here this normal equation matrix is a matrix consisting of six by six blocks so all the camera rotation six by six parameters are specified down here and these off-diagonal elements basically link the camera orientations with the points that are observing so this basically tells me what which which points are observed in which camera image so you can still see the structure over here which tells you what has been seen which camera orientation is linked with which point parameter and this matrix is still a sparse matrix so those matrices are clearly sparse they only have the block diagonal elements and those matrices are still sparse they are less sparse than those elements and they basically encode how often so how the camera was moved through the environment if you're observing the same scene over and over again or more and more often these matrices get more densely packed with non-zero elements if you just walk through the environ environment for example once and you have only very limited field of view then this matrix will get sparser and sparser because you're only relating the point that you see with the camera images and if you're observing always something new or more is always something new than those matrices will stay more or less empty we can look to the same example if you kind of increase our problem let's say we have seven stripes with 15 images per stripe so we just scale up the environment that we are covering then this would be the resulting a matrix jacobian matrix that we have we can see exactly the same structure again that we have now more stripes and more images per stripe and if we then compute a transposed a times the weight matrix then we would get this structure and we can see that the number of non-zero elements is actually increasing over here because the matrices get larger they have mainly non-zero elements so the number of cells elements that we have is basically quadratic and the number of fillings that i do is kind of linear so the larger the number the the matrix gets and with this under the same movement pattern so to say the more zero elements i actually get in there and so if we kind of just make a few computations for this simple problem we have in the a matrix um 3.3 uh percent of the cells which are non-zero and they're more for the n matrix because smaller because um it only depends the number of unknowns and the number of observations are not in there and if i go for the larger matrix those numbers decrease so the larger the matrix is the more sparse it actually gets so it basically grows linearly with the with increasing the camera images and the points that i'm seeing the kind of fill in this matrix is basically linearly increasing of all the elements i have in my matrix is quadratically increasing so there will be more and more zero elements compared to the non-zero elements okay so let's have a look to this normal matrix how does this normal matrix looks like and maybe we can exploit besides that it's a um that is a sparse matrix so we should use spar solver to solve it maybe we can even do a bit more and see what kind of structure we can actually exploit so we assume that we have a block diagonal matrix for our observations that means i have a covariance matrix in my image a two by two covariance matrix um but the individual point observations are independent from each other so the every two by two block can have off diagonal elements but basically it's a block diagonal matrix so there's no correlation between different point observations so that's basically the assumption that i have in here in this case my normal matrix which is a transposed information matrix a i can set up a or split up a in c and b so a transpose gives c transpose b transpose and c b here for a and then i can multiply those elements out so i will have here blocks relating c the observations b with the observations and so on and so forth so i basically have my normal equation system with those elements nkk and kt and tk ntt and these are those elements that we have discussed before so this was kind of the example that we had down here and you can see nkk is this matrix over here which corresponds to this block ntt this block over here and those correspond to the off diagonal blocks over here just this one is just the other one transposed so what do we have this over here this nkk is a block diagonal matrix with three by three blocks on the main diagonal the rest is zero similar ntt is a block diagonal matrix with six by six blocks encoding the camera orientation parameters and these matrix over here consists of several three by six blocks and these are the connections between the points that i'm observing and the camera from which i'm actually observing these points over here so all those elements in this matrix can be computed very easily so um the i on the on the off diagonal elements down here i simply compute c sigma i b transposed and the others are my diagonal matrices where i'm just kind of collecting the the observations of all points all images where the point i has been observed gives me this sum and all the points that are observed in image j i'm summing over those points compute those entries so i can directly compute all the elements of my normal matrix directly from my small low dimensional matrices cij and bij so you can fill those numbers here directly and don't have to fill build up the matrix i can just store for example or directly compute all those non-zero values again if i'm using modern toolboxes for sparse matrices i can actually allocate a large matrix and fill those matrices so then this is automatically taken care from the sparse matrix library that you're using but you can compute all those elements explicitly individually if you want to store them individually on your own for example okay so let's see what else we can do given the fact that we have split up our unknowns vector in the number of um in the three point coordinates and the camera orientation parameters how can we actually exploit that further so we have our normal equation matrix n delta h delta x equals h my right hand side and i know that this n had this structure that i was exploiting before i know that x has the structure k and t and i'm kind of not splitting up my right hand side just picking h k and h t over here without specifying at the moment further so this is kind of the structure that we have in our normal system equation and now i'm doing a trick and this is a trick which may be unintuitive at first but um it's kind of an informed tip so we i tell you that we can do my manipulation to this equation now which then will simply follow life so we just do a manipulation now and then later on see that it actually simplifies my life and what i'm doing i'm multiplying both sides of the equation with another matrix from the left hand side so i'm multiplying a matrix over here and multiplying a matrix in over here a matrix that i specify a matrix that you don't come up on your own that easily but a matrix if we use this it'll actually simplify our computations and this is this matrix over here so this matrix contains elements that stem from this normal equation matrix and i'm doing this because i want to bring this matrix here which sits in front of the delta x into a special form so what do we have in here we have the inverse of the nkk block the inverse of the nkk block pops up here again and we have the ntk block over here and this is an identity matrix so i'm taking this matrix over here and taking this matrix over here and multiplying both equations with the same matrix from the left hand side so i'm not changing the equation itself the equation is still the same and what i'm then doing i'm just doing this matrix matrix multiplication over here so whatever i'm taking this vector multiplying it whereas this vector gives me the first element and this multiplied with this this multiplied with this one this multiplied with this one gives me my four elements that i have in here so for example this multiplication over here is nkk inverse times nkk plus 0 times this matrix gives nkk inverse times nkk gives me the identity matrix and i'm doing the same for all the other blocks so i'm ending up with a zero over here and these equations over here right so and now an interesting thing pops up and the interesting thing over here is that i have this zero element this zero over here so this part over here is zero so what i can do now i can just take out the lower block over here and solve this lower block because if i multiply this vector with the unknown this vector here with this unknown vector all the delta k's will go to zero so i'm only what i can do is i can only solve i can solve only for the unknown parameters delta t so i'm just basically cutting out that system over here and then i'm only solving this part so ntt bar so this is my ntt bar so this is the matrix n bar and i'm taking out the tt block so ntt is this one over here times delta t equals h bar t and this is just the definition of my h bar t vector okay so if i kind of look to my matrix to my matrix n i can turn this into the matrix n bar tt which is this matrix over here which gets smaller but it's more densely filled and because this bottom i'm ignoring for now i can do the same could do the same thing for the other matrix over here um but this reduce uh system here my matrix is still sparse and in our case it's a 126 by 126 dimensional matrix because this has been the involved um parameters that we have in here orientation parameters so number of unknowns that we have from our 21 images uh yeah 21 images with six parameters so uh 20 one times six gives me 126. and therefore this is a 126 by 126 matrix because if we solve this system this one over here we are basically computing our orientation parameters from our camera ignoring the point parameters okay so what we can do okay we can first illustrate this so this was the original matrix we computed the normal matrix out of this and then actually this reduced normal matrix for ntt only and you can still see it's still a sparse structure it's more filled than this entity so you get this off diagonal you get off diagonal blocks over here but you're kind of still remaining a sparse matrix and the larger your matrix is the sparser this matrix still gets so let's have a look to this matrix because how difficult is it actually to compute this matrix in here and then actually to solve it so what we can see in here in order to compute ntt we just need elements from the matrix n and only two blocks that we need here which requires a matrix inversion so for computing this we need to invert the nkk block over here but the good news is that this is an operation which can be easily done why is this the case the reason again is because we know it's a sparse block diagonal matrix so we know that this block nkk by design is a block diagonal matrix where the blocks are always of three by three so the only thing we need to do is we need to invert those individual blocks down here in order to compute the inverse which is much easier than inverting a dense matrix so this is something that can be done easily basically in linear time in the number of of points that i have so it's an operation which is linear in the number of points for computing this inverse so that means i have my matrix over here my end bar tt which i can compute and what i'm doing i'm just solving this matrix still using a sparse solver so a matrix which supports sparse matrices and allows me to solve them quickly so i have this matrix n which is not only the number of unknowns a squared matrix the number of unknowns as a sparse matrix which i need to solve and if i solve this matrix i solve it for t so i obtain my camera orientation parameters i know where the cameras are after doing this trick and i can do this in an efficient manner because these make this matrix here this nkk matrix is this block diagonal matrix if this matrix would be dense this would be a very costly operation to already set up this matrix and it wouldn't be of any help so after i've done this i kind of completed the first step of my estimation problem namely that i have computed the orientation parameters of my camera now what i need to do i need to compute the parameters of my 3d points exploiting that i know my camera parameters so i want to obtain the three points given this delta t given that i computed it and this now can be easily done i still have my my the system that we had over here where we kind of cut out this lower part and solve this lower part separately and this got so we solved this for delta t so delta t is known now so we can now use is we can use um we can solve for the first part and directly exploit that we know delta t so we take this we multiply this matrix these this vector here or this matrix with this vector so i times delta k plus minus nk inverse nk delta t gives me this part of the equation n inverse nkk inverse hk so this basically gives me this equation down here so multiplying this one with this element equals this element so this is what i can extract out of this and this is now an equation which contains the parameters k and it involves the delta t that i computed in the previous step so i can only do this because i was able before to solve delta t separately without knowing k okay so what i then do i kind of move this to the other side and so that the resulting equation it returns into delta k so my point coordinates are the inverse of this matrix k k and again this is the block diagonal matrix which i already computed so there's no extra effort in here and this is the hk block minus nkt times delta t and this is kind of the known parameter that i solved before so in the second step i can obtain this matrix and k only needing to compute basically vector vector matrix multiplication in order to solve for this delta k and i don't need to do any inversion because this element i already inverted before and i was able to do that in version in linear time in the number of points that i have and so i can also do this efficiently so what has happened now is that i was able by the kind of by design so how i set up my matrices to separate these matrices into blocks into the orientation parameters of the camera and before that the three-point coordinates and based on this structure i could see how the matrix is cij and bij have been designed based on the jacobians which are sparse this has led me to a normal equation system which is sparse and then i could do this trick by multiplying this normal equation system from the left hand side with this magic matrix which turned it into a problem that i could easily split up into two parts first solving for the camera orientation parameters and then second solving for the point coordinates and all that using exporting the sparsity of those systems and this is a trick that i'm doing in order to solve this in theory very large system in an efficient manner so also further note again the full jacobian or coefficient matrix a doesn't need to be constructed explicitly so you will typically not compute um a explicitly um what you will do is you fill you just only compute the elements which are nonzero which you need and you're even not computing often a explicitly and you can directly compute the elements of your um of your n matrix of your normal equation matrix just through those low dimensional matrices again these are um these two by three blocks these are two by three and two by six blocks so these are all small low dimensional matrix multiplication that i need to compute in order to fill the individual blocks of my um n matrix and so this can be all done in an efficient manner and then you construct your reduced system in this form for that again you need to invert nkk but this can be done in linear time in the number of points that you have because it's a block diagonal matrix and finally you then you solve this you construct the system you solve it with a sparse solver and then finally can obtain the point coordinates from that that is a way how you solve your system of linear equations for in the context of the bundle adjustment problem and how you can solve even systems which on the first glance are too large to be solved in a reasonable amount of time so the second thing i want to briefly look into and what happens if we actually apply the bundle adjustment problem without control points because that's also something which we need to take into account which can happen if we don't have fixed control points i mean a computer or bundle adjustment problem but note that in this case the bundle adjustment problem will not generate you with a only will give you a photogrammetric model and not a model where the scale information is available and you have a symmetry so everything is defined only up to a simulated similarity transform so rigid body transform and the scale parameter and that's something that we have seen already before this is something where we have used for example the absolute orientation problem for in order to compute the absolute orientation for solution that was computed for for orienting a photogrammetric model kind of anchoring it in the real world and what we now need to think is consider that we know that if we don't have any control points the solution is only up defined up to a similarity transform will this lead to actually problems numerical problems for us and it turns out yes this will actually be the case the problem that we will have is that we this this matrix n will have a rank deficiency a seven dimensional rank deficiency because it has a solution space that can be obtained by using the same photogrammetric model and basically moving it everywhere in the world and scaling it up and down it will always be the same solution so we have um we have a so-called gotch freedom um and that means we need to specify a datum we need to specify this seven degrees of freedom constrain the matrix so that we kind of get rid of this rank deficiency because otherwise our solver will probably most likely report an error to us and then we don't know what to do right so how can we change our problem in order to fix a datum to specify a datum and we can do this while introducing additional constraints so what we can do is we can say okay the center of mass of my problem should not change so basically i'm constraining the center of mass so that it doesn't move i can also set a certain point as kind of the as my zero zero point which shouldn't move that's also an alternative that i can do i can also say that the main directions shouldn't change so if i compute basically the cost covariance matrix over all points and compute the eigenvectors for this you kind of want to keep these eigenvectors in the same directions and you can also say that the average distance between the points in the center of mass shouldn't change you're basically fixing the scale that the scale shouldn't change so if you see this as the points that you have you can compute the center of mass that you have of those points and say i want to fix the center of mass and then you can say based on the distribution of the points um you want to define your coordinate system let's say whatever the first one is the one the axis with the um largest eigenvector um and then uh or the with the largest eigenvalue corresponding to the eigenvector when you take the eigenvector as your coordinate system this is for example one way and then you look to the spread of the points in order to fix your scale so this is one constraint that you can set in order to fix the seven degrees of freedom by introducing seven constraints the skate shouldn't change the center of mass student change and the orientation of the of the main spread in your three points should not change that's one way for fixing this there are other ways that you could do for example if you you can set one of the points as the center of your coordinate system and then maybe use some other points nearby in order to specify where the main directions are in order to fix the scale you can do this also with other constraints but you basically do this with a constraint matrix and so you add a constraint matrix which should fulfill the constraints h times um x should be zero and then this jacobian um the corresponding jacobian is actually added to it's actually added to the um to the normal equation system in order to perform this minimization so you're adding additional constraints which basically fix your um your god freedom and you need to do this because otherwise your um you will have a ranked efficiency and then cannot com come up with the unique solution of your linear system and so you want to have that and therefore fix your god freedom kind of the first remark that i want to do at that point i also have a short second remark we already discussed outliers um and how to deal with outliers in the first part of the bundle adjustment lecture so so far we just assume have assumed everything is outlier free and we were still living in our kind of gaussian world that we only have the remaining points which are outlier free are subject to gaussian noise and then can compute our statistically optimal solution in practice we are doing a couple of steps before in order to get rid of those outliers or at least reduce the possibility that they are outliers in our data set by breaking down to small blocks solving them separately and using other outlier rejection techniques which i briefly mentioned before what we however since we ever do not know if we actually really found all the outliers in there what we typically do we also add a robust kernel to our state estimation problem that means we are moving away from the parabola that would be the um the correspondence or the log likelihood stems from the log likelihood of the gaussian distribution and change this distribution um so that we have a different shape and so what we basically do we want to have an increase in the values further away for large error values so there is not a parabola anymore but a function which grows more slowly than a parabola or even goes up to a constant and there are different ways how we can kind of reduce this penalty for terms with larger errors if you're far away from zero l1 norm is one example everything turns into a linear function huber is a very popular choice which has the parabola close to the zero error configuration and then turns into a linear function afterwards increasing or even the more extreme kernel which has maintains a parabola here in the middle and then goes basically constant values so that if you have points which are outliers which are far away then whatever a few sigma then you actually reduce the impact of this value to zero and what typically happens in button adjustment you either fix on one of those um those robust kernels but the problem is if you really weigh down the effect of outliers which are further away it always means that you have a good initial guess because if your initial guess is very far away from the actual solution then you will actually end up in this plateau of your error function and then the jacobians will not pull um your your points towards the right solution so if you have a very good initial guess you're probably fine with an error function which looks like this if you have a bad initial guess you want to stick more to some more conservative kernel function such as uber kernel so depending on what problem you have and what you know about your problem you may vary the robust kernels that you're using or maybe even vary those robust kernels over the iterations of your bundle adjustment problem one of the nice things there's even a formulation for a robust kernel which is a family of parameters and so that you have your error function or your kernel function where you put in your error an alpha and a c parameter basically the c parameter specifies the width over here and the alpha parameter specifies which function you're choosing um and as a result of this so for um you you're basically getting up different kernels so if alpha is two you basically have your um your gaussian function over here and as you as you um decrease alpha from two down up to minus infinity so this would be the value for minus infinity you obtain different kernels so you can also use one of those techniques in order to weigh down the effect of errors and even kind of change this during the optimization process what does the change so you can show that if you introduce such a robust kernel then this is equivalent to adding an additional weight term to your points so you're you're adding additional weights and the weights depend on the actual current error configuration and this is the weight function that you obtain for the different kernels so this is a weight of one which holds for the gaussian so everything is gaussian then we have our standard problem every point is weighted with one so everything is identical but if you choose other kernels whatever this i think is the huber kernel over here the green one then you would get a waiting factor which looks like this so if you are go further away from your zero air configuration the impact or the effect of this point will be reduced through this additional weight and so you can use those robust kernels and then back then um as weights in a weighted least squares problem and that's how it is done in general so you will always have a robust kernel running in the background or to make sure you're not screwing up your solution if you missed a few outliers in your outlier ejection scheme that you used initially okay so this brings me to the end of the lecture today uh talking about the numerics of the bundle adjustment problem so what do you need to take into account when you want to solve the bundle adjustment problem um so again band adjustment was a least squares it's a least square solution to computing the orientation of cameras and the position of 3d points in the world under uncertainty and we have done this by minimizing the so-called reprojection error so the error that we can measure by taking the point where we think the point is mapping into the camera we think the camera is and comparing this estimated location with the actual measurement that we got so where was that point the image in our actual image and we are trying to find configurations for the 3d points in the world and our camera orientations that this distance gets minimized and minimizing the squared error or the sum of squared errors or squeezed through a robust kernel function and this if you do this in the standard least square setup this leads for realistic scenarios to very large linear system that we need to solve and if we just would use them out of the box without exploiting any properties of these systems that would get a system which is too large to be solved so the first key insight that we have that is the bundle adjustment problem leads to sparse matrices in nearly all setups so whenever i'm not observing every point in every image which is typically not the case then this will actually lead to sparse matrices sparse matrix a another result of this is mass matrix n and we can then first thing we use need to do is we need to use spar solver so libraries that are optimized for not storing the full matrix and only taking the non-zero elements to account for the inversions for the multiplications so that this can be done fast by picking only the correct elements which are non-zero and this is basically done automatically for you today if you just tell the system that the underlying matrix that you're using is actually a sparse matrix and the second thing that we need to do is we or we can do is we can use the estimation problem split it up into two parts first solving for the orientation parameters of the camera by building this reduced normal system and then in a second step computing the um the point parameters so the three locations of the point given the camera parameters and we do this in every iteration and in this way kind of decouple the estimation of the joint values and decouple the estimation of the parameters for the points from the estimation of the camera parameters which is typically done within the button adjustment problem so with this i think you know all the kind of main tricks or things that you need to do in order to actually build a solver for your bundle adjustment system i hope that was useful and kind of completed your knowledge about three reconstruction and how to localize a camera and compute a 3d model of the world using the kind of gold standard method today which is bundle adjustment thank you very much for attention
Info
Channel: Cyrill Stachniss
Views: 3,748
Rating: undefined out of 5
Keywords: robotics, photogrammetry
Id: LKDLcKrWOIU
Channel Id: undefined
Length: 64min 0sec (3840 seconds)
Published: Tue Sep 29 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.