The Basics about Bundle Adjustment (Cyrill Stachniss)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello i want to talk today about bundle adjustment and explain you the key ideas of this gold standard method that we use in photogrammetry as well as in computer vision in order to perform 3d reconstructions of the environment from camera data so given that you have images taken from a camera and you want to turn this images into a 3d model of what you're picturing then this lecture here is for you it's a first part on bundle adjustment there will be a second lecture coming soon looking into the numerics of bundle adjustment so which kind of numerical tricks can we use not to solve the bundle adjustment problem in an efficient manner but that's something that we're going to investigate next time today we'll look into bundle adjustment itself and what are key problems that i have if i want to do my three reconstruction and how i can actually tackle this problem so 3d reconstruction is a relevant task in a lot of applications and we want to today look into the question how we can do this from image data so that we don't have a range sensor such as a laser rangefinder but only work on camera images and a large set of camera images and perform the three construction in the lecture so far we looked into camera pairs so how can we estimate 3d information of the scene giving a camera appear and today we want to extend it to an unlimited number of images so if n larger than two images how we can we perform 3d reconstruction of the scene and this is a still a very relevant task and something that we use very often as standard tools for measuring tasks so what you see here for example is a uav that can fly through the environment take camera images with its camera here and then build 3d models of the scene in this example it's uav flying over a field and the resulting estimate could look like this so you can see here with those blue planes this should be the image planes the different locations where the system has been and what you see down here are triangulated feature points so for example sift features that have been extracted from the camera images we perform a triangulation and estimate the locations of those 3d points in the environment and this would be the output of a bundle adjustment system we can use this output and process it further in order to solve other tasks for example generating auto photos an orthophoto is a special type of image that we'll also investigate here within our photogrammetry lectures which is basically an image where you can measure in in in this x y plane so a distance in whatever 20 pixels over here corresponds the same distance as 20 pixels over there you can see the overlay of the order photo here from that field and the locations where the uav had been flying and then we can of course turn this also in a digital elevation model so that you can see here the height information that is extracted from this field so even plots here um where you see different heights and this is the information that you can extract from this camera data this is just kind of one classical example using this for mapping from the air from uav from an airplane or in the old days from the balloon but of course you can also move your camera freely through the environment and perform the 3d reconstruction task not being constrained to an aerial vehicle for example so we may ask ourselves why do we want to do multi-view re reconstruction so why are two images not enough and this can have several reasons why two images are not sufficient one of the um explanations could be that the object has simply a too complex surface so that two images are not sufficient to picture the whole surface of my object or the resolution that a single image or two images can have is not sufficient to fulfill the constraints on the precision that i may have and then i may need multiple images in order to get an accurate reconstruction of the environment we may also want to estimate the motion of the sensor through the environment for example if you want to estimate the location of the uav or want to track the position of a car or robot through the environment and in this case we of course interested in a sequence of images which have been taken from the camera installed on the car in order to do the 3d reconstruction task so we can on the one hand side estimate the location of our sensor in the environment and on the other hand we can estimate a model of the environment itself and i brought you here a small example this stems from luke van gaal's lab at kuleven and a spin-off company g automation which performs mapping tasks using image technology so what you basically see here is a van moving through the environment and on these then are several cameras installed and these cameras picture the environments the surrounding of the car so you can see here there are eight camera images been taken um always pairs of two in a stereo setup so two looking to the front one looking to the left uh one pair looking to the right and one pair looking backwards and then by moving through the environment you can perceive the environment at different locations and record those images extract features out of those images estimate the location of your vehicle with respect to those features relative orientation plays an important role here and then we can kind of put those things together into in the end a very large least squares problem that we want to solve and this is the solution of the bundle adjustment problem so in the end we can perform mapping here at city scale and obtain free point cloud information about all those streets where the vehicle has been driving just based on the camera data and you can overlay this with maps you can also use it of course to generate maps and it's a standard mapping technique in order to build models of the environment and estimate the vehicle within that map of course if we have that map we can also kind of go into the 3d point cloud that is generated and perform simulations of what we would see where points would be mapped to on a virtual camera that is driving through the environment so i can do this of course with the location of my actual camera but i can also do this with a virtual camera so that i can realize those fly throughs through the mapped space and estimate what a camera would see if it would be at a certain location these are all tasks that i can do with the result that is the output of such a bundle adjustment system so final adjustment or originally called bundle block adjustment is the technique of estimating the orientation of my camera which is a six degree of freedom parameterization parameter for every camera and the three locations of points in the environment and i'm doing this on block so i'm taking a larger set of images into account simultaneously in order to perform this estimation so today this term block is typically dropped in modern literature so today we just call this bundle adjustment and band adjustment is a technique which is around things approximately the 1950s developed in the photogrammetry community and it is frequently used or was traditionally used for building maps aerial or building maps through aerial vehicles so by flying over the environment triangulating certain points in the environment i was able to estimate where was the vehicle and where are the points in the environment but we typically also use our so-called control points which are those points indicated here by those triangles these are supposed to be points where we know the 3d locations in the scene already and then we can actually use this to kind of anchor the model or the photogrammetric model that we are building here with our bundle adjustment system and anchor it in the real world and i can also use this to fix certain points at certain coordinates so arrow triangulation is a common task and one of the kind of standard tools that you can do when you want to build a map of the environment you want to perform measurements for example using a uav and collecting your image data and what all the kind of out of the box software that you may be using in order to perform your model estimation does is basically performing bundle adjustment so there's it is kind of the standard solution they of course different flavors you can exploit different assumptions if your uav for example has a gps on board then it's a wise choice to use this um gps information for your 3d reconstructions and therefore you may have different bundle adjustment systems which can exploit different properties but overall this is an automated process today and really a standard tool that is frequently used and again this was the example that i've shown in the beginning of the it's kind of the real world image of this old illustration origin by akaman this is how that looks like today in reality so the main question that we need to answer is kind of how does under adjustment work what is behind it how does it work how do i turn my image into a 3d representation of the world and as before we are assuming in all our three reconstruction tasks that we can estimate features from our image data this can be sift features this can be surf features this can be binary features whatever it is but we assume that we are able to extract distinct points in our images and then we are only using these points in order to perform our 3d reconstruction task that means for all the features that i extract from my image i want to estimate the three location of the object that generated this feature response in my image in the real world and i'm doing this using a non-linear least squares approach for estimating the camera poses as well as the location of the 3d points in the environment and doing this simultaneously at the same point in time and kind of the overall deal is explained rather quickly so as with all these squares approaches or non-linearly squares approaches we assume to have some initial guess we will talk about later in this lecture how we may obtain this initial guess and then we take the initial guess in terms of three locations of points in the environment and six degree of freedom camera orientations and project those 3d points into our virtual camera images so we say assume the camera location then the 3d point locations are correct or would be correct where would the three point being mapped to to which pixel in my image plane and then performing this mapping virtually and get a pixel coordinate for this point in a certain camera image and then i basically compare this projection in the 2d image plane with the actual measurement of that point that i obtained so by checking in my real camera images where has this point actually been mapped to which pixel location and comparing this to the pixel location where i mapped the two in my virtual camera i can see if there's a discrepancy if there's no discrepancy everything looks good so that means the point and the camera configuration are consistent with this observation but often this is not the case so often there will be discrepancy and this is the error that i'm trying to minimize so i'm trying to find configurations for my camera orientation as well as for the 3d points in the world to minimize this so-called reprojection error so i'm projecting the three points back into the camera image and compute my error in the camera image basically in an offset in x and y x location and i'm trying to optimize so change my camera position and change the location of my 3d points in the world in order to minimize this arrow so in the end hopefully the error is close to zero everywhere and this is the oval approach and of course need to iterate this process so i get an update in my unknown parameters and i reproject them back and perform this these last three steps over and over again because it's a nonlinearly squares approach we need to linearize in here and as a result of this we need to iterate that okay and what i now want to do is i want to look a little bit further into this reprojection error into the projection of the points from the 3d world into the image plane and the discrepancy of what we think where the points should end up and where they end up in reality so at this point in time i assume you know how to map a point from the 3d world into a camera image with our standard equation small x or the location in pixel coordinate equals p x where p is the projection matrix and x is the three point in the world and we will come up actually with the equation which is shown over here which is the basic equation uh for my least squares approach for my gauss markov model so what i have here on that side is the pixel location of the point i projected into camera j so i is my point id and j is my image id so point i projected into camera image j so this is the pixel coordinate in some arbitrary frame that we have in here and then these are my corrections the v um x i j so this is basically the correction so then the discrepancy um between where the point is mapped to according my coordinates and um where i'm actually seeing this so this part is where i'm actually seeing this point this is my correction and that should be equal to this equation here on that side so what we see here if we start here from the right hand side we can see that this is the 3d location of the point in the environment where i assume this point to be so it's initially this would be my initial guess of the point i in the world and then i'm projecting this point x i through my projection matrix of my camera and this will give me inhomogeneous coordinates therefore the scaling factor over here the point in my image plane so what i here have on that side is the three location of the point expressed in homogeneous coordinates my projection matrix p which projects a 3d point in the world into my image plane and you can see this there are a couple of parameters here involved in this projection matrix so this is my projection matrix with my potentially non-linear calibration parameters and my scale factor over here and remember this in a homogeneous object is only defined up to a scale factor so we have this scale vector over here if we look into a minimization problem and we this this projection matrix here has a couple of parameters these are my projection parameters as we typically know them from our direct linear transform and this is the fine camera as our model and then in addition to this we have non-linear distortion parameters or other parameters and as well as the pixel location where the point is mapped to originally in my pixel before the non-linear corrections so that i can take the appropriate action and um take all the nonlinear errors into account so what we basically have in here in this p are all the intrinsic parameters of my camera my typical typically five intrinsics plus potentially a number of unknown parameters and i also have uncertainties associated to this so these are the uncertainties in the image coordinates so how precisely can i actually measure a feature point so if i have set features and i may be able to see i uh say i can nail down a sift feature up to let's say a third of a pixel then this would be the uncertainty that i would actually generate and we can have individual uncertainties for the individual image points recorded in the individual images and also take correlations into account if we have that information available okay so what is this reprojection um doing it projects a point from the 3d world into my camera image and so what this equation encodes is the projection equation from my camera so all the intrinsic parameters of my camera and the extrinsic so where is the camera and how does the internal mapping process looks like um the error is basically the distance of the discrepancy between where i see the point in reality the real observation and where i think this point will be mapped to and this is exactly the discrepancy so the the amount by which i would need to correct the observation in order to bring this to an equality and this equation encodes a collinearity constraint and this is basically given through this index i over here that means if i see this a point in one 3d point in the world from multiple camera images let's say from image one and two let's say this is point i which i see in camera image one and two so i would have i1 and i2 over here by knowing this is the same index j i make sure that the exact same point in the world is actually projected into my camera images so the assumption here is that the two projection rays from the two camera images actually intersect at that point x i and this point x i can be uniquely determined because i know for every point in the world where this point is mapped to a camera image and that's also something that we refer to as known data association so we can for every feature that i extract in my image i can uniquely identify this feature point so we'll find exactly the same feature point in all other images that is in reality a very strong assumption and a lot of efforts that bundle adjustment systems do in terms of computational effort goes into finding the right data association in order to make that correct and avoid mistakes in this data association because mistakes in this data station will have a bad impact on my state estimation problem and i'm not likely to come up with the correct solution if i don't take the fact into account that i have outliers in my data association so that's something that you have to take care of in reality that you get your data solutions correct okay so the question now what are my unknown parameters so how many unknown parameters actually do sit in this equation that i want to estimate so first of all what we have we have our 3d point in the environment and this point has a three dimensional coordinates so x y is that coordinate in um in my world so i need three unknowns or three unknown parameters for every point that i see in the world i have my scale factor sitting over here and this is just a one-dimensional a real number and this is just the factor that comes from the fact that i'm working in a homogeneous space over here so where everything is only defined up to a scale factor then my i have my exterior camera orientation so where is the camera in the world this is a six degree of freedom vector encoding the xyz location of the projection center of my camera as well as the direction that the camera is looking to with three rotational angles your pitch roll for example and then i have five parameters for the kind of standard intrinsics in my projection matrix these are kind of the linear parameters the fine camera model that i'm using and a certain number of unknown parameters q which are potential parameters describing an un non-linear error for example a barrel distortion for barrel distortion i would have one or two parameters that i need to put in here in order to describe them and these are the additional parameters that are involved in here so those parameters are typically independent from the images that i'm taking so i'm assuming i'm typically assuming that my camera so the camera parameters the intrinsics don't change if i'm at a different position if this is the case we may also take that into account for example if you have an airplane flying and your camera is exposed to strong temperature changes then certain parameters of your projection matrix may change so during flight when the camera cools down because you're high up in the air certain parameters may change and potentially you need to take that into account as well for now we often assume however that those calibration parameters are actually constant during the data acquisition process this of course needs also be taken into account in an extreme case if you just take a set of images and you have no idea from which camera those images have been taken in this case um of course we have the effect that every image can have completely different calibration parameters and these are kind of extreme cases of the bundle adjustment system where we don't use one measurement camera but maybe collections of images taken from the internet and want to build a 3d reconstruction of the scene which is then more advanced because it has every image can be or it will be generated using a different camera and such such a different camera calibration matrix so we can take our equation here and actually break it up into two parts so one is the calibration matrix over here which includes all the interior parameters or the interior orientation of the intrinsics and our exterior orientation also called extrinsics which is my rotation matrix and the location of my production center in the environment okay just to kind of illustrate the unknowns that we have in there just make let's make a small example so let's have a look to the example we say we have 10 000 images and maybe we see 1 000 feature points in every image so we have rather high resolution images and extract 1 000 points per image we have 10 000 images over here and let's assume we will see every point on average 10 times so that means we have a substantially dense coverage of the environment with camera images so basically we have everything seen 10 times so the question is how many unknowns do we have right now how many observations point observations do we have given our equation over here and it turns out that the number is actually quite large so what's the degree of freedom of my observations so what i'm observing is an x and y coordinate for every point so for every point in every image i get two variables an x y coordinate and a y coordinate in my pixel coordinate frame so i have i said i have 1 000 points per image i have 10 000 images every observation is two dimensional that means two times ten thousand times thousand which gives me 20 million observations so i have the degree of freedom of my observation vector would be 20 million in this example over here the next thing let's have a look into the number of unknowns that i have so if i have 10 000 images with thousand points each and every point is seen on average 10 times that means i have 1 million points in the environment that i'm actually mapping every point is it has a three dimensional vector attached to it and xy that coordinate that means i have 1 million points times 3 gives me 3 million parameters for the point coordinates then however i have a large number of unknowns in terms of the scale parameter because the scale parameter is given for every pair so for every camera image and feature point camera image and feature point i get an own scale parameter that means i have 10 million of those scale parameters so the scale parameter is only the same for the x and y location for one point but every point will be will have an on-screen parameter that means 10 million unknown scale parameters are involved in here then i have camera orientations i have 10 000 images so i have 10 000 orientation parameters with six degrees of freedom if i'm not only considering the um the x26 and assume my camera has just one set of calibration parameters so i may have whatever six to ten additional parameters for the intrinsics which we can completely ignore over here in terms of their number um so if i sum this up i will end up having something around 13 million unknown parameters so a 13 million dimensional vector of unknowns and a 20 million dimensional vector of observations what you can already see those are really really huge numbers and we need to look that we can actually get those numbers down if we look to this equation and is the number of unknowns there's actually one quantity in there which i'm actually not really interested in so actually i don't really care about that scale parameter i mean i know i need that scale parameter because i've expressed everything in homogeneous coordinates here as an homogeneous entity and every homogeneous object is only defined up to a scale factor so as long as i'm working in homogeneous coordinates i need to take that scale factor into account but in practice i'm not interested in the scale vector in the end i want to have the xy that coordinates of my point in my euclidean space and the three locations for a three dimensional location vector for the camera and the three dimensional rotation parameters vector for my camera and the rest i don't care about so that's a point in time where we actually should move back from the homogeneous world into the euclidean world although we are we like our homogeneous world because it allows us to express things typically easier than in the clean world now as a point in time where it makes sense to move back so we now should move back into the euclidean world in order to get rid of those skill parameters because this will reduce me the number of unknowns that i have from 13 million to something like 3 million so it's a big decrease in terms of the number of unknowns and that is something that i should do so what happens if i'm turning this expression back into euclidean coordinates i basically perform this operation i need to divide by the last component and so i can actually rewrite this so this vector just by the change of fonts turns from a three-dimensional vector so two plus one dimension in my homogeneous world into a two-dimensional euclidean vector a two-dimensional euclidean offset vector and then i have here this expression which is the original projection matrix so kind of the first row in the second row divided by the third row of the resulting backdoor that i have so this is this notation this is the first and the second row and this is the third row which is basically the normalization that comes from the mapping from the homogeneous world into the euclidean world so if you are if this confuses you i recommend you to go back to the lectures on homogeneous coordinates where we introduce homogeneous coordinates in here so that you understand um how that mapping from the homogeneous world to the euclidean world actually happens and then i have reduced my unknowns from 30 million to approximately 3 million which is a big gain and that's kind of the step that i could do here on that abstract level so in this example i have 3 million unknowns 20 million observations this gives me still a very large system that i need to construct but i'm kind of better off than with 13 million unknowns and theoretically from then kind of my standard procedure starts of setting up my system of linear equations or my normal equation system um in the least squares sense so with my unknowns x and my observations l so you know using the standard notation from the least squares estimation community where x are my unknowns and l are my observations um so be aware the x we used before so the x here is an observation and this x here is now my unknown but i'm just using it here on the slide and wrote unknowns and observations underneath it so make sure you don't mix that up but then we are setting up our standard system of linear equations so we have um our matrix a or jacobian transposed my information matrix my jacobian times delta x so the change in my parameters is the jacobian transpose the information matrix times the delta in the observations where and by solving this system i basically get an update to my unknown parameters and this way can iteratively estimate my unknown parameters and solve my least squares problem so from a kind of theoretical point of view that's all good we know what to do and can solve this problem so that in the end the bond adjustment solving the bundle adjustment problem is just solving a least squares problem in practice however it's slightly more tricky um the reason for that is among other reasons that this linear system gets huge very quickly so for real-world setups this gets very very large and as a result of this the typical out of the box application of my dense system of linear equations which should be solved cannot be applied because it's too inefficient for the computers that we have in order to solve that therefore we need to do a couple of tricks and exploit some of the properties that the bundle adjustment provides for us in order to avoid the the fact that i can't solve my system of linear equations that's something i'm going to cover in the next lecture looking into the numerics of the bundle adjustment problem so for now we are ignoring this for today and assume we can solve this very very large system and we don't care that in practice our memory is not large enough to store even all the quantities all the quantities that we have so we ignore this for the day and say okay assuming we can solve this and we will find a way for solving this discussing this in the next lecture and so we can solve it and in this way obtain our unknown parameters and this way get an estimate about where the cameras have been and where the 3d points are in the environment um in order to perform all the estimation approaches involved in here so how does it look like this is an example of a couple of images taken here actually in our kitchen next door so you can see here the estimated locations of the cameras placing a few objects on a table and extracting feature points here of a table a couch and a few objects on that desk and in the environment and then just by taking a couple of those images registering those images with respect to each other fixing the data association doing the estimation we can estimate the location of these points in the scene a standard output of a bundle adjustment system in this example created with the open source software mesh room which you can use where you can plug in your camera images and get a 3d reconstruction of the environment the software can even do more you can even do an estimate of the surfaces so that you get a reconstruction that actually looks like this so you can see the table over here the two objects placed and the couches over here you can see that the the um the fabric of the of the couches are is not perfectly flat and smooth so there's still some noise involved in there also leading to resulting from the fact that the data station on the fabric here is probably very challenging because it's very hard to find distinct points but over all in all this gives you a really accurate reconstruction of the environment and turns these set of camera images actually into a night 3d model so you can try that on your own take images with your smartphone or with your camera and plug it into mushroom and then give mushroom a couple of hours time in terms of computational resources a computational um or in terms of computations and then you will actually generate or it will generate for you a model in with three-point locations as well as surfaces of the environment and this is a typical result that you get out of that so after we have done that we want to kind of inspect the result that we obtained a little bit further so what are properties of the result what are things that i need to take care of especially considering that i'm taking a least squares approach for example with respect to the initial guess with respect to outliers or with respect to control points so points with known location in the environment what do i need to take care of in order to come up with an appropriate model of the environment so first few words about the properties of bundle adjustment the great thing is that bundle adjustment is a statistically optimal solution under certain assumptions statistically optimal means there is in a statistical sense no better way for solving it but of course there are some assumptions associated to this so the unadjustment approach approach exploits all the observations that we have been taken and considers all the uncertainties and potential correlations that we actually have if these if we can specify it the system can actually take into account and in this sense takes all the available information into account in order to come up with the estimation of our unknowns that means it estimates the orientation parameters of my cameras so x3 exterior orientation and interior orientation as well as the location of the three points in the environment at a high precision the assumption that it does is um everything is gaussian so we have gaussian noise involved in here and that can be seen as a very strong assumption especially under the effect of the unknown data association so the system assumed the noun data association and the gaussian noise on the data associations but what we have in practice is that we can nail down a large number of the data associations correctly but there's a risk that some of the data associations are wrong or will be wrong and if we assume purely gaussian noise without any potential mistakes in the data association we will not get a great model of the environment and the second thing is as all the least squares approaches or non-linearly squares approaches um it requires us to have an initial guess about um where the cameras are in the environment and where the three points are in the environment and this initial guess actually matters here because if you have a very bad initial guess the adjustment is unlikely to converge so we need to invest some brain power and maybe some additional sensors in order to get a good or reasonable initial gas and are then able to converge to the right solution the first thing i want to look into is the now the absolute orientation through control points so again absolute orientation was the task of anchoring the model that we are that we have created in the real world and also fixing the scale so fixing a similarity transform between the model that i've computed and the real world that's something that i need to do why do i need to do this because the the reconstruction from camera images without any additional information only gives us a so-called photogrammetric model that means a model that is only defined up to a similarity transform that means that means we cannot say where that model has been taken with respect to an external reference frame these are six degrees of freedom of the similarity transform so a three dimensional translation vector and a three-dimensional rotational component but what it also cannot fix is actually the scale so we do not know the absolute scale of the scene we can say the distance in that scene is larger or smaller compared to a different distance in that scene but we do not know its absolute scale the reason for this is similar to the relative orientation that we computed that cameras are basically direction measurement devices and which don't tell us anything about the absolute scale unless we have some additional information like we know a certain size in the world of an object in the world or we know the translation that camera has been taking so if we have this additional information at hand then we can fix the scale or have an additional sensor but otherwise it's only defined up to a scale factor so what we need to do is we need to um fix the scale and the six degree of freedom rigid body transform so in sum we have a similarity transform between our photogrammetric model and kind of our real world and this is something that we can solve with the absolute orientation problem something that we have discussed so if we know a certain number of points in the environment in the simplest case three or more points so if points that we've seen in the world we know they are 3d coordinate then we can actually anchor that model using the absolute orientation approach that we have discussed previously in here but we can also integrate that all into our bundle adjustment and just adding the control points to our least squares problem and solving it jointly without explicitly executing the absolute orientation and that's something which is typically done but it also directly raises another question and this is actually how good do i actually know my control points are my control points really absolutely perfect points or do they also have a nice uncertainty associated with it because what typically happens is you have an additional measurement device with which you measure the your control points um and then you're observing them with a less accurate measurement device such as a camera that means you have a more accurate sensor which allowed you to nail down the location of your control point but still there's typically some noise associated to that so the key question is our control points that means points for which we know their xyz location in the world should i consider them as noisy or should i consider them as noise free so what's the best thing so how would an error actually look like if i want to take that into account my least squares approach i would say my estimated coordinate should be identical to the provided coordinates so my kind of ground truth information plus some corrections because this information that was provided to me may not have been correct if there's noise associated to this so this is probably a small vector of my corrections but it may be non-zero so what's the right way for taking into account using it like this or saying the provided coordinate is actually the real coordinate and i can take the coordinate of the control points out of my minimization problem that would be the alternative so should it noisy or noise free and we can answer this question under different objective functions so we can say are we interested in getting the statistically optimal solution taking into account all the points including the control points if this is the case we should consider the control points as noisy because they are noisy in reality because they are not perfect and we should take that noise information into account at least as soon as we have it available so taking into account is needed for the statistically optimal approach if i however would fix the control points what it actually means is that i'm enforcing a geometry onto the bundle adjustment solution so it means certain points will not be moved they stay at the location where i fix them through control points and this can also be interesting for example if you want to build a model and align it with some official map data for example so a map that you that you as a user are not able to crack but you want to have your model being in line with that external let's say official map for example then you want to really fix your control points to make to enforce this geometry on your model that you're actually computing so that the resulting model that you get is in line with the official map data this is then a statistically sub-optimal approach that means in a statistical sense you don't get the optimal solution but you enforce this additional constraint that you say these control point locations must be exactly these coordinates otherwise it's not going to work so it depends a little bit on the on where your data comes from and what you want to do with the resulting model if you want to enforce the geometry or you want to also correct potential mistakes in the control points you can also do both things kind of together so you start with noisy control points perform the statistically optimal solution and then you perform a statistical test in order to search for gross errors or outliers in your control points because maybe even there was a mistake in your control point someone noted a wrong idea and you kind of mix up coordinates of those points something that shouldn't happen but of course can happen and what you can do is you can perform a statistical test and see the uncertainties that you assume for those control points are in line with the result that you get and if this is not the case for some control points you can actually eliminate those outliers in your control points get rid of them don't consider them only fix the other control points and then run your bundle adjustment with your fixed control points if you want to be in line with for example some official map data if you don't need to do this you can save step two and don't have to do step two and you're done after step one and and have your statistically optimal solution so a few last words towards four control points so the question is how many control points do i actually need so if i think to my previous solutions for computing geometry out of camera images which was the direct linear transform or the protective three-point algorithm we needed three to six control points per image pair so for every pair of images i needed three four up to six control points depending if i was interested in the p3p or the dlt solution so the question is how many control points do i need to now now need for the bundle adjustment problem so if i would run the bundle adjustment compute the button adjustment solution completely without um my without control point information and then later on compute the absolute orientation so assuming that only a similarity transform separates the resulting photogrammetric model from my map data then um i only need three control points because we have seven degrees of freedom that need to be fixed every point have um the three-dimensional vector so we need to have at least three control points in order to fix this um this is typically in bundle just been used you still want to have a few more than just three non-control points but the number of control points that you need is much much smaller than this three to six per image pair um so what is typically done that kind of the boundary of the area you're mapping you're actually covering this with control point if you have that information at hand to kind of minimize the uncertainty of the 3d point reconstruction inside your your area that you're mapping with your bundle adjustment approach and the reason of not requiring so many control points is besides the fact of course that your statistical optimal solution is one of the key reasons for using button adjustment because it's simply highly unpractical practical to have that many control points that you can guarantee that every pair of images sees three or even six control points and therefore you use go for bundle adjustment because you can so dramatically reduce the number of external control points um again you don't even need control points if you're fine with obtaining a photogrammetric model itself without anything else okay now we come to uh two further very important things assumptions that the bundle adjustment system does and the question is how can we actually tackle those assumptions how can we um not make sure but at least increase the probability that um we are we can fulfill those assumptions and this doesn't lead to a problem and the first thing is the initial guess my initial configuration as all the non-linear least squares approaches we need also here an initial guess so initial configuration of the three location of the points and our camera orientation the question is how do i get that initial gas if i'm very far away from the initial gas the system is unlikely to converge to compute the right solution and as a result of this you won't get a consistent photogrammetric model out and you will not be able to accurately determine the location of your scanner in the environment um so for our p for image pairs we had direct methods to apply for estimating the relative orientation for example the eight point algorithm or the five-point algorithms are techniques for coming up with a direct solution direct solution means we don't need an initial guess so we can do this free of initial guess so what we can do is we can use those tools from the um orientation of image pairs and execute them um always pairwise so image one with image two image two with image three image three which image four and so on and so forth in order to connect this image sequence and this way come up with an initial guess of course this requires that this kind of chain of um of relative orientations is never broken up that i never lose track of something which is again a strong assumption and typically violated in reality so you need to typically break that down into smaller chunks and try to come up with a good initial guess because there is no closed form solution for n views i only can break it down into smaller parts um so if i go for this approach of saying we i take the protective three-point algorithm or special resectioning in order to estimate the orientation of my camera and then through triangulation estimate where the points are in the environment what's the problem with this approach why should i not be why can i do this or what may be suboptimal over here and there are a couple of reasons why this can be a challenging situation i said already we may end up in situations where i lose track because not every in every pair of subsequent images i can actually find corresponding points maybe have a strong motion blur somewhere or a strong rotation which can lead to the fact that i'm not finding corresponding points for a certain amount of time um there are even singular configurations in this reconstruction task for example for the protective three-point algorithm if you remember that and this would not lead you to an appropriate solution so not everything is actually perfect in here and this is a way to go and we can use this typically as initial guests but it's not guaranteed to work out um very well or typically works well but it's there's no guarantee that this actually works well again some of the critical issues that i just mentioned are shown over here the other thing is also of course if i have outliers in my pairwise data association i will get a wrong transformation at some point in time and then my initial guess will also not be great so dealing with outliers or gross errors is a critical factor in here in order to come up with a good initial gas so computing initial gas and dealing with outliers is a task which is actually coupled with each other and so the next thing i want to discuss how do we actually deal with outliers that we have before we answer this question if we first answer the question what's this actually the reason for our outliers what are things which can go wrong not to be two things one can go wrong one is we can find the wrong correspondencies and the other thing we have wrong point measurements so mistakes in the measurement process but it turns out that actually making the wrong data association so the first product is actually the real problem in reality it's not how precisely can can actually perform a pixel measurement for example it's more that i compute features point features in my image and i mix up two point features because for example i'm picturing object which is self-similar and it's very hard to say is this this point here on the right or this point on the left or i have multiple objects which look the same in the same scene it's very hard to make that data association and this will lead to wrong correspondences wrong data associations and the question is how can we appropriately take that into account and address this so there are different ways how i can tackle this the first thing is if i have multiple observations of a point and i can make certain assumptions i may be able to identify that within a set of observations there is an outlier and this brings me to the point by saying kind of how many observations of a point do i actually need in order to deal with outliers at least detect outliers or maybe even identify which point is wrong and for that we need to get an idea on how many observations of a point do we actually need you know to say something about a point so consider what happens if you see a point only once in one single image that means we only know even if you would perfectly know the location of the camera we only know that the point lies on some ray on the ray that is projected into that pixel so that every pixel corresponds to direction vector and i know the point lies on the direction vector but i have no idea how far it is away so i cannot even fully determine the position of that point so what happens if i see a point from two distinct locations if i see the point from two distinct locations i typically get an intersection of the two rays and through this intersection i can actually nail down the point i get the 3d coordinate of the point in the environment so with two observations i cannot tell anything about an outlier i just can say i can just estimate one coordinate i have no idea if it's right or wrong i just assume it's correct there's no outlier involved so i can estimate the three location of a point okay so what happens if i have three observations of the same point what i can do is i can now i'm able to actually take that there may be an outlier involved in this how can i do this i have three images and i always take pairs of images so image one and two image two and three and image one and three and then i'm performing this triangulation in all three points and if this is an outlier free observation then all the intersection points should be more or less at the same location and if i have if they don't end up at the same location they are spread over the place and say okay one of those observations was wrong but i typically cannot tell you which one i just say one of the three is wrong but i have no idea which one if i then have four observations of a point i can repeat the same process and break it actually further down and then i'm actually able to say okay i have an outlier if i have one outlier in here i can actually tell you which observation contains the outlier um so the more points i have the more i can say about the outliers and at the rule of thumb you can say you should have between five and six different observations of every point in order to get a high quality estimate so make sure you see every point in the world from multiple positions in order to get a good three reconstruction of that point so this is kind of one thing that you want to do the next thing what we're typically doing is um we are typically not solving a whole bundle adjustment problem as once we're doing we're breaking this down into small blocks let's say blocks of three to six or typically more on the order of six images and then we solve kind of a small problem within this within this six images itself and look for statistical errors in this small set of images and if we find um through statistical tests that a point may not be a good point we are going to eliminate that point and removing it from our observations in order to avoid taking a wrong data association into account it's typically totally fine to ignore a few observations ignore a few points if i can make sure that i'm then in an outlier-free situation so what i'm also can do is that i'm only considering features that i can actually track over a set of these three to six subsequent images so especially if i'm not taking random image collections into account if i for example have a trajectory because a vehicle driving through the environment observing the scene i can exploit this sequential nature in which the data has been obtained in order to get a better estimate and then i can for example say only if i can actually track a feature over multiple frames and get similar descriptor vectors from my sift descriptor for example i'm actually taking into account if i have larger discrepancy it's hard to track a feature or a feature disappears and reappears it's safer to ignore that and the last thing which i can do which is computationally more expensive i can actually run a ransec procedure so random sample consensus approach which is um a kind of a trial and error approach trying to guess the right data station and then see how well this explanation is with what else the system knows and if i use for example the five point algorithm to estimate the trajectory of my camera um we're using five corresponding points in an image and i combine this with the renssec based procedure i typically get a good initial estimate and can eliminate the outliers at least those which i can estimate from sequential data and eliminate those outliers and then can continue with a potentially outlier-free set of points and i do this for all the blocks and then fuse those blocks and only in the end after eliminating the gross errors i actually run my full bond adjustment approach this kind of the manual way for a manual but semi-automatic way for getting rid of those outliers something else you should do in your optimization process is actually move towards so-called robust kernels um so typically in our least squares approach we are just saying we assume everything is gaussian or we say we can identify outliers we eliminate those outliers and the remaining things in the remaining part everything is gaussian and this is good if i can really eliminate all outliers then it's typically a valid assumption but i cannot guarantee that and typically some of the outliers may remain in here so what i then can do is i can say okay i'm not using actually a quadratic function i'm using a different function and then a quadratic function in order to reduce the effect that outliers can have on my overall solution so one thing what i can do is i can use the so-called l1 norm so i'm just taking the absolute value of my error m as my error function or as my kernel and so i'm not taking quadratic function into account this means that all the outliers or depending how far i'm away it's only a linear effect on the error not a quadratic effect which is much much less or gives much less weight to the points which are further away also very popular choice is the huber kernel which is basically a quadratic function close to um to the minimum so kind of be in a gaussian world over here and at some point in time the gaussian turns into a linear function so it's a combination of the gaussian and the absolute value of the l1 norm so to say so whenever we are close to the right solution and we are in the outlier free world everything would stay in the gaussian world and we get a solution which is very similar to the gaussian solution but if we have a few outliers the effect of those outliers is only considered linearly and not quadratically i can make it even more extreme in saying okay i'm taking a kind of quadratic form in here but as soon as the outliers or the the point is the error is larger than a certain value i'm actually decreasing this tomorrow as a constant function so it doesn't matter if i have outliers they only matter up to a constant degree and if they are very far away from the zero error configuration this data point is basically ignored because the gradient that i have here the jacobian on this error function is actually close to zero so it will not impact my system anymore but it also means the more you move away from your parabola the better your initial guess must be so with an error function over the kernel like this if you have a bad initial guess the system will not even move you into the right direction because nearly all the points will be considered as outliers these are three examples of robust kernels there's even a more generalized description of these robust kernels so you can see a family of kernels over here where this here is my quadratic function so my gaussian and then i have different forms of kind of weighing down outliers the further i go in the end the way up to the extreme case where it doesn't matter how far i'm away it will always give me the same weight or will always give me the same penalty so to say and what we can do is we can take those robust kernels into account and integrate them into the hourly square system through a weighted least squares problem that means in the end i'm having a wait function and i compute based on the current error configuration a weight and weigh every observation with this with this weight function which which has this shape over here and this tells me how strong i'm actually considering it so in the gaussian world the weight is one so nothing would change and the more the more extreme kernels i use the more the the weight decreases the further i'm away from the zero error configuration and so these robust kernels can be quite easily integrated into our standard least squares by just introducing an additional weight and that's something that you typically do in all button adjustment systems that even different approaches that you can do that you start um with different kernels and then vary the kernel over the iterations um depending how far your way or how good your assumptions are about your initial guess or how much you know about your outlier configuration so it can be seen as an art to set those kernels actually in the right way but of course it's always good reduce the outliers you can identify as outliers and then those which remain you try to cover with a robust kernel quite often you if you use an automated system you don't have to bother with that too much because they are fully automated solutions for bundle adjustment available that you can just use out of the box and they will take care of a lot of things like identifying features from your from your image data they trying they breaking this down into small blocks try to find an initial solution try to identify outliers get rid of those outliers and doing all those things i have talked about for you in an automated fashion you may even be able to integrate ground control points either by providing certain special tags or features that the system can find autonomously or you still need a human operator that says okay this is a control point over here that's another control point over there but then the system can actually take that into account those systems are typically computationally quite demanding especially if you can't exploit certain properties this that you as a designer know about your system so if you know that you for example have an autonomous car driving through the environment you know that your car has for example akka can be explained by commands during the movement of the car you know that your camera images have been taken let's say with 20 or 25 frames per second that your car has a typical speed that it can't move sidewards all those things can be taken into account in order to constrain the problem and especially dramatically simplify your data association so the majority of computational resources in those systems are spent on finding the right data associations and if you can integrate background information that you have about your system you may be able to do better if you however don't take any constraints into account you have a free floating camera in the environment without any further information those commercially available systems or open source systems do a really really good job it's very hard to actually reproduce those results so a lot of engineering effort went into those systems there are a couple of commercial software systems out there quite popular is photoscan and pix4d as two examples but also mushroom became extremely popular over the last years it's an open source software so you can have access to the source code and the great thing with the software is that you can actually modify all parts of that system so you can easily replace for example a component for finding data associations if you want to add your additional background information um for your system into that pipeline so if you want to play around with the software um either just for fun or trying use this for educational purposes or you actually want to rebuild a real system with provides us high quality 3d reconstructions i can actually recommend mushroom and would be the first choice i would actually start with so last but not least i want to talk a bit about the quality of the results so the key question is now i computed my model how good is actually the information that i have what can i say about the resulting solution can i provide uncertainties for the 3d points in the environment for example and yes that's something i can do so from the standard least squares formulation we know that we can compute the so-called theoretical precision which we take by taking our jacobian transpose the information matrix jacobian multiply them with each other and invert this matrix so it's the inverse of our of the matrix from our normal equation and if we multiply this with our variance vector then we actually obtain what's called the empirical precision and the empirical position tells us something about the uncertainty of our parameters so of our unknown parameters over here the question however is how good is this precision can we actually trust this precision how good is the result that we're actually getting in here and for the case of the relative orientation of the image pair we can actually provide some information how accurately we can determine certain parameters the role pgr for example or the translation vector based on the distribution of points in the environment so grouper points or double group points have taught us something about how accurately we can measure the relative orientation based on corresponding points for the bundle adjustment problem that is much more complicated because it strongly depends on the on the scene itself and the structure of the scene on the motion of your camera and where this is rather easy to quantify for whatever the stereo normal case with six or twelve points in the environment which are uniquely distributed over the space that gets much more complicated in the real world on the adjustment system therefore it is not that easy what we ever should do we should check our variance factor and see does the variance factor takes a value of approximately one so if the variance factor takes a value of one this suggests that we actually use a correct model and um so we need to just inspect our variance vector computing and see are we actually close to one yes or no and the question is how far what does mean close to one actually means does it mean exactly one is 0.99 good is 0.9999 what i need um what should i do what you can do is you can perform a statistical test in order to judge if your variance factor is close enough to one and this needs to take into account the redundancy so how many observations you have how many unknowns you need to estimate but it also takes into account what is the uncertainty that you know about your uncertainty so the uncertainty of the measurement noise for example so how precisely can you specify the measurement noise or is there some uncertainty in the measurement uncertainty that you're actually providing and it turns out if you you run a standard f test that you're typically doing in order to do this given that we have a very high redundancy over here this f-test typically fails and what you need to do is you need to take the uncertainty of your measurement uncertainty into account so taking into account that you cannot precisely specify how accurate your sensor is this information also has an uncertainty associated with this and this you can actually you can still use the f-test um by changing the redundancy based on the uncertainty um that you have about your measurement so you can encode it in this and run a statistical test and which then tells you yes your variance factor is close enough to one given uh the cruel information that i have here to say you probably used the right model and if this is the case then this precision or the empirical position actually does a pretty good job in telling giving us a realistic estimate of the uncertainty of our parameters that means if we eliminated all our gross all the gross errors we have a small systematic error our variance factor is close to one according to statistical test that means that we actually have a good estimate about the precision of the parameters through this equation over here so we can say something about how certain are we about the location of the point in the environment how certain are we about the orientation parameters of our camera and we can take that into account in an appropriate manner so i brought you a few examples on where bundle adjustment systems are used so this is an example of a robot exploring catacombs so underground structures where for example if no gps information and that was a part of the european project rovina that i and a couple of colleagues have been executing um from around 2003 to 2013 2016 building a robot which is here equipped with the rig of seven cameras and light sources move through the environment and estimate and gets basically at every point in time this set of seven images and basically it's moving through the environment in this direction has this ring of cameras observing the scene and then can always find correspondencies first among neighboring images um and also between different time step and in this way estimate the correspondences and then put this into a large bundle adjustment system and perform a 3d reconstruction of the environment so that in the end in this underground environments you get can reconstruct the three information that looks like this in an in an accurate manner and come up with highly accurate 3d models of the environment you can map texture through that and though that these 3d models so this basically just surface information that you extracted if you overlay that with the um with the the color information you can actually map the texture onto that surface and this was a work done in the european project with several collaborators and um so look van gogh's team from ko luvum are the experts on this 3d reconstruction who generated these dense textured 3d models of the environment and even able to take into account the reflectance of objects of different materials classify these materials and come up with really photorealistic reconstructions in the end this all is done offline that means the robot collects the data and navigates through the environment with other local sensors and then makes all the computations in terms of the bundle adjustment system offline this takes a lot of computational resources that's something that you typically can't do online there's another development in the robotics community which is called slam or the visual slam problem which is very very similar to bundle adjustment there's an example of a system called orbeezlam which is out there since a couple of years developed at the university of saragosa and what what it does it basically extracts features of the environment and is targeted to real-time operation estimating the trajectory of that system in the environment so you can see here car drive um through call through this is the kitty data set and you see down here the estimated trajectory of the vehicle based on the um on the green points that you see here it has been extracted from the environment so these points that you're seeing here that provides you with this 3d information about the scene and as the system is moving through the environment the system in every image extracts new feature points aligns those feature points whenever the system comes back to a known location like over here basically the system relocalizes the previously computed map and performs a so-called loop close that means in the end aligns those uh close those points over here you have seen this over here where you basically make a data association between a position where where you're right now and the position you have been in the past so it's not just sequentially it's also over larger time step and this kind of loop closure allows you to then reduce the uncertainty dramatically and come up with accurate estimates of the environment so also loop closure will happen here very soon see the loop closure has been executed and the system then builds a consistent model of the environment recomputes at every point in time the least squares problem or whenever something leads to substantial changes performs a re-optimization and this is actually something that you can do in real time the accurate of the results is not as good as in the example that i've shown before from this underground mapping example but this runs in real time and doesn't need basically a cluster to perform all the computations so it depends on what you need in order to build up your map of the environment and localize and navigate in that map last but not least i want to come back to the very traditional application in photogrametry where you use an aerial vehicle in order to observe images in nadia view or close to nadir view so looking downwards onto the surface and trying to estimate a map of the environment so the 3d structure of the scene something that is called aero triangular or aerial triangulation and you see those old figures over here where you see the image plane and the projection centers of the cameras and the three points a small set of distinct points on the surface which originally has been identified by a human operator if you think of the time whatever 60 or 70 years back today of course we do this digitally in an automated fashion today those images look like this so you fly with a uav you don't need a full aircraft anymore over the ground with a downward facing camera and you extract your 3d points rather dense thousands of points and images not a problem anymore today and then are able to build a 3d reconstruction so the question is how do we actually build those maps how do we fly over the scene in order to get a good model of the environment so how should my flight pass being set up that i actually get a good model of the environment and the good thing is in aerial images you actually can fly patterns and cover basically in stripes the environment by only looking with your camera downward that's easier than when you do a full 3d reconstruction tasks with different viewpoints that you need to be need to take into account so a typical sensor setup looks like this you're basically flying over the ground in stripes then kind of it's an airplane this is kind of the turning maneuver of the airplane that's of course much easier for uav because you can fly in arbitrary directions and then fly back and you basically cover the environment in stripes and you make sure that you have a substantial overlap between the images so whatever sixty percent overlap in the flying direction and a twenty percent overlap in the sidewalks direction is kind of the minimum that you typically do it can even go up to ninety percent and eighty percent of course the larger the overlap the more images you need to take to cover the same area but also the better your 3d reconstruction are so depending on how accurate your 3d reconstruction should be you need to have a larger overlap because then you see more images or more points from a larger number of images or you sorry you see the same number of points but from a larger number of images so you have more observations per point and you decrease the probability of making data association mistakes because there's a smaller translation between those images so everything gets easier if you have a larger overlap so in the end it will actually look like this so consider that we have whatever 49 points over here with four control points sitting here in the corners and you're basically flying over this ram taking the first image over here a second image over here a third image over here so you can actually see this overlap and you're basically flying in this stripe pattern through the environment and this is kind of the typical pattern that one is using in order to cover an area for example a field as i've shown it before and the question is where should you actually place your control points so where should you make efforts to precisely measure control points which is a labor-intensive and thus expensive task in order to get the best reconstruction and what typically happens is you should actually place those control points at the boundaries of your mapping problem so if this is your field and you want to estimate the field the best thing you can do is actually put your ground control points to the outside to the boundary because it it fixes your uncertainty the uncertainty in here reduces the uncertainty and propagates this information about the control points to the inside and what you also may want to have if you can afford it are so-called height control points which are those circles over here height control points are control points where you just know the altitude of that point if you do a real flying mission and you're basically flying for example over lake um this can be useful information because then you can you know that all those points are more or less taken the same height so you can maybe you're able to easily achieve a certain height control points because this avoids that actually your your your flight or the the estimation actually warps um or kind of rolls artificially rolls the airplane in order to kind of compensate for some of those errors so the these height control points actually fix it and ensure that you have a good estimate of your surface so what happens is you're basically taking your first image which may look like this so you see those nine points over here and then you're basically flying over your second image has the source pawn these sees these nine points so that means these six points over here is an overlap that you see in both images the third image may be over here so there's again those uh sorry those two are the overlap with image number two and these points are still the overlap with the first image and you're basically moving through the environment like this always re-observing your your new points that you're seeing you're observing some of your control points and this is taking into account and fed into the bundle adjustment approach so if you have only a certain number of control points that you are willing to pay for so to say then you should actually distribute your control points at the boundaries of your mapping problem but the question is what else can we do in order to simplify the problem and refer to the data to them to the map that we built with respect to some let's say official coordinate system or some map information and for that something that you do today is you exploit gps information or gps imu information if you can afford it even differential gps and imu information that means you have a receiver where you obtain an xyz coordinate depend on how much money you pay and how much technology you put in there this can be very accurate or are also rather imprecise you typically have an inertial measurement unit which helps you to get an estimate about the orientation of the change in the orientation and you have your camera and of course you need to calibrate those three symbols with each other but if you have that then you get a rather high frequency information about where the cameras have been taken so let's say your gps or your dgps information provides you let's say an accuracy of five to ten centimeters um you also need to take into account that you're actually moving quite quickly with an airplane so time synchronization even um may be a challenging factor in here so even if in a static setting you can be better than this let's say 10 centimeters um if you're flying very fast um this may change your setup quite a bit and you can see this basically as placing control points in the protection centers of your camera so you're fixing camera locations um in xy that through this gps information and you can even use a combination of gps and imu in order to estimate the orientation of your camera because you can from the gps and imu information estimate the trajectory of your vehicle of your aerial vehicle or ground vehicle and in this way it dramatically helps you to get an estimate also for the roll pitch and your information so having this information at hand generates your additional observation for your bundle adjustment problem basically adding in this case noisy control points or noisy information about not only the points but also the orientation information that is then taken into account so you may ask yourself if i have a good gps or good dgps available do i need ground control points at all do i really need to go to the field and measure locations in the field with an additional measurement device in order to precisely get this coordinate and feed that into my bundle adjustment system in reality only a few number of control points are needed and you especially need them or maybe you even don't need them at all but if you want to align the model that you do with official map data at a very high precision you still need to take that into account it also helps you to eliminate systematic errors that you may have so having a certain number of control points that may be a great tool and may also help you to get the differences between the different coordinate systems for example the gps coordinate system and the coordinate system of the map you're targeted to align your model with so ground control points are there still a use for that but the number of control points you need has dramatically reduced through this technology control points are hard to obtain ground control points in reality why is this the case you need to go there you need to measure that point and you also need to flag those points that you're actually able to or signal those points that you're actually able to find those points in your image data and this can be done in that way that you actually send a person there they paint the control point that you have they precisely measure that control point um over extended periods of time and then you have the precise location of this point you need to go back to your image material you need to find those signal control points mark them and then feed them into your bundle adjustment system so it's something that is time consuming that is expensive because it requires manual labor and you would like you will try to avoid this but depending on the constraints in terms of precision and also with respect to consistency with official map data for example you may still need to do that so in the end the aerotriangulation approach allows you to use photographic tools and image data in order to estimate maps at comparably large scales because you can fly over larger areas and build relatively cheap cover large areas and build maps from that and you can actually roughly bring this down to uncertainty in x and y location of approximately 2.5 centimeters with a typical setup so this is a an estimate of a good uncertainty that you have about the location of points in the x y plane typically you have a higher uncertainty in the altitude but for the x y plane you're in the order of a standard deviation of 2.5 centimeters um if you do this you want to get as accurate as possible large overlaps are suggested maybe even exporting some height control points and of course the more you investigate in invest into your img new gps configuration the better off you are so you can easily spend 100 000 euros for a great imu which can actually reduce your the error in your orientation um especially the angular parts dramatically combine this with a high quality gps invest a lot of efforts into time stamping to calibrating your system then you can bump your accuracy further upwards but it gets expensive it gets labor intensive and you need to be really able or you really need to know what you're doing if you actually want to go substantially beyond that over here okay thank you very much for your attention this brings me to the end of the lecture today we have been talking about bundle adjustment which is a least squares approach to the relative and absolute orientation problem for cameras considering uncertainties so we are estimating the location and orientation of cameras in the environment as well as the three locations of a point in the world taking all the uncertainties into account and come up with an as good as possible estimate it's a statistically optimal approach which minimizes the reproduction error so the error of the estimated points projected into the estimated camera images and see how far they are away from the actual measured ones and underadjustment is the gold standard today and statistically optimal solution under certain assumptions that we have been making and we discussed here what we can do in order to get initial gas what we need to do in order to deal with outliers which statements we can make about the the obtained precision that we get how to use this in different applications and and what is the error that we are actually minimizing what we haven't done so far is talked about how to solve this linear system how to exactly set that up how does this jacobian matrix look like and what do we need to do in order to solve that so that we don't have to deal with these millions of dimensions in our state estimation problem and that's something that we're going to discuss in the next lecture where we look into the numerics of the bundle adjustment problem and which tricks we can do in order to solve that in an efficient manner if you want to dive deeper into that there are several resources that i can recommend so the button adjustment and modern synthesis paper although it's by now i would also say roughly 20 years old is still a very good read um a standard reference to the bundle adjustment problem also the book by harleen scissorman multiple view geometry addresses not only bundle adjustment but all the geometric reconstruction tasks of camera images if birther read and also the photogrammetric computer vision bible by wolfgang firstner and ben and fobel is a very good resource to go deeper to study those aspects and you can invest a lot of time into doing it right you know to estimate all the uncertainties correctly into integrating additional models so there's a lot of stuff you can do in order to get the best possible maps out as last word the bundle adjustment problem is very similar to the visual slam problem slam stands for simultaneous localization mapping a term coined in the robotics community um where slam is a somewhat more general formulation then the bundle adjustment problem taking different sensor modalities into account taking different motion models into account but overall they are more or less the same thing so the other adjustment problem is basically one instance of a slam problem where you use a least squares approach to estimate that you may also use other techniques to solve the slam problem but if you're interested in getting the statistically optimal solution out that slam problem will be nothing else than the bundle adjustment approach just maybe taking different error functions into account because you may interpret um your sensors in a different way if you don't have a camera if you have a camera and a laser scanner for example maybe combine this with some other motion models some constraints about your vehicle but the overall idea is very very similar and basically boins down to something very very similar so with this i hope that was useful and gave you an idea on what bundle adjustment is how to use it how to apply it and next time we will look into the numerics on how to solve the underlying least squares problem in an efficient manner so thank you very much for your attention

Info

Channel: Cyrill Stachniss

Views: 10,497

Rating: 5 out of 5

Keywords: robotics, photogrammetry

Id: sobyKHwgB0Y

Channel Id: undefined

Length: 83min 12sec (4992 seconds)

Published: Sat Sep 26 2020