CVFX Lecture 18: Stereo rig calibration and projective reconstruction

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

okay so where we stopped last time was if I've got a single camera how do I calibrate it meaning how do I estimate its internal parameters like the focal length and the principle point and we talked about how we do that basically with you know showing that camera a checkerboard pattern more or less that we wave it around and the positions of those corresponding corners of the checkerboard are what are used to figure out how the camera you know undertakes the image Probation process we also mentioned this idea of resection resection is basically if I know the real world 3d positions of points and I know those prediction projections of points on the 2d image plane I can only also use that to form up camera matrices and so the first thing I want to talk about today is what I would call stereo rig calibration and so the setup there is that basically I have a rigid bar or a rigid housing that contains a pair of cameras and these cameras are kind of stuck to this bar so they can never change their relative rotation and translation okay and so now what I want to do is I want to figure out how to simultaneously calibrate these cameras meaning how can I simultaneously estimate the camera agencies for both cameras so we already know that since these cameras are separated by a physical you know length that these cameras are related together by a fundamental matrix right and so the intuition is that we should be able to get at the camera matrices for the two cameras via the fundamental major excretion which we can estimate using some of the techniques we talked about in the last chapter right so we know how to find future points use the future points to estimate the fundamental matrix so now I've got the fundamental matrix how could I use that to estimate the cameras well there is good news there's bad news the good news is that yes you can do this but the bad news is that you have some extra degrees of freedom that mean that there is no unique pair of camera matrices that corresponds to a given fundamental matrix right well you think about this is that you know the camera matrices so if I call this you know let's make sure I use the recommendation from the book I assume so I call this P and I call this P Prime right this is 3 by 4 this is 3 by 4 we know that actually there's one degree of freedom in these already so really there are only 11 parameters estivate in P because it's unique up to scale but this F only has 3 by 3 9 entries and we know that actually there are only seven unique degrees of freedom so basically we've got our you know at least you know say 20 degrees of freedom for the P is even if we make these something the pixels are Square and they don't skew or anything like that but I've only got 7 degrees of freedom in F so basically you can see that I've got like way too much stuff to estimate in P and P prime that I could estimate just from knowing hat right so that leads to ambiguities what I mean by that is the same F can lead to many different combinations of P and P Prime okay so that's the bad news now you could say okay well some of those ambiguities are relatively benign right so for example you know let's suppose that that's kind of hard to do with this overhead but you know so what one thing I could do is I could say okay well here's my scurrier rig I could rotate it and translate it to here or rotate and translate it to here clearly I'm always going to get the same fundamental matrix between the two image planes but all those are going to have different P comma P Prime and those differences are all attributable to an underlying rigid motion so you can say well okay six of those parameters six of those degrees of freedom of ambiguity are taken into account by this unknown rotation and translation so what I could do is I could say okay I'm going to fix the camera center of the first camera at 0 0 0 and I'm going to say ok I'm going to assume that I know that the world chorded frame has kind of a Lyme duck with this cap right I guess more accurately with more like this where I say okay here is the X Z and Y is pointing up out the paper so I can fix the frame of the first camera that removes some degrees in freedom but it still turns out that there are some left ok and so we're going to talk about this ambiguity a little bit more in the context of today okay so first of all is somehow these fundamental encounter matrices are generally related okay so to mathematically formulate what I just said what I would say is let's assume that the first camera matrix looks like this brant this is like saying this is a 3 by 3 identity matrix and this is a 3 by 1 matrix of zeros and this is my camera calibration camera calibration matrix from before that contains the you know focal a and send the principal points and so on okay so really now there are only four things to estimate for this p and now the p prime the other camera may have a different camera calibration matrix and it's located you know it's some different place so it has kind of a generic you know r and t then specify where is that camera in the coordinate system of the first count okay so if i were to think about you know what that means that's like saying that again here's my here's my rigid bar that fixes the two cameras together that's like saying that in world coordinates the center of this camera is 0 0 0 and the center of this camera turns out to be minus R transpose T you can prove that's true and so how do I figure out well let's think about first of all what would be the ever posed of the you know scary ring glow the F polls yet polls are basically saying if I were to project the camera Center from one image onto the other image plane what would be the point that I would get and so the other polls well actually it's not too hard to compute this would be like saying okay well what is the projection of this point on to image one I would do it with the carer matrix I would say that I take P times negative R transpose T and again this is up to scale so what I would get is something like K times I zero times negative R transpose t1 and I would get K R transpose T really would be like you know so there's a negative here but I can kind of get rid of the negative with this tilde that says equal up to scale and kind of going the other way around like I said only this is this is e this is the epic Pole it's like saying where do I see this point over in this image plane the other pole is basically say okay I take the other camera matrix and I project onto that you know 0 0 0 and so what I get is K K prime T ok so I can figure out where the two F poles are right right so if I project this this way I would say where do I see that point on this image plane okay so this is just a little bit of homework to figure out where the up pulls are in the images and if I put all this stuff together I can figure out and this is a homework problem that I don't think I assigned because it's a little messy but it leads you through it so the fundamental matrix in terms of all these camera calibration matrices turns out to be the following and I'm just going to write this out for posterity you guys are going to need to know it or use it for the moment so what does this notation mean if I have some vector with little X underneath it my notation is that turns out to be a matrix that is defined as follows let me just write that here so it has zeros down here and has negative C positive B that has negative B positive a oops this is not what I wanted say this again zero zero zero negative C positive B and it B should be symmetric across this way so this is basically saying I have a three by one vector and I turn into a three by three matrix anyway so this is just a special matrix displays the role in this equation and so this basically tells us okay if I knew the two camera matrices how would I produce the corresponding fundamental matrix right so this is this is a useful formula to know because it tells me if I knew what the cameras were what is the corresponding F right what I really want to do in this problem is I tell you the F how can I unwrap it back to the corresponding parameters you know KK prime R and T right so that's where we run into this ambiguity problem right because it's easy to take the camera agencies and figure out what the corresponding F is but it's hard to go the other way around okay and part of the reason for that is the following okay so the key idea behind the camera matrix is that I take my eye for my little image point in homogeneous coordinates as the camera matrix times some point in space right this is the Kerr matrix this is a 3 by 4 matrix this is the seen point in the homogeneous coordinates which is 4 by 1 I multiply these two things together and I get some 3 by 1 vector ok so this is the way that images are if that image projections are formed but suppose I gave you a px ok so I tell you ok you know here is your camera tricks and here is the point in 3d space the bad news is that what I could do is I could put in the middle a new matrix H right where this matrix is 4x4 right and so as long as I choose any non singular matrix H I can achieve for you exactly the same image projection that I would have seen right so basically this says even if I were to estimate a px that perfectly gave me all the image projections there would still be this possibility of interposing this other weird matrix in the middle such that I would never know what was the truth right I mean this could be the identity matrix in which case I would get the original P and X it could be some rigid motion in which case would be like you know translating and rotating the stereo rig or it could be some sort of weird projective change which would totally whack my images and points out so let me just kind of give you a sense of that so here's a picture that kind of shows you know the difference is in the 3d points that could occur by applying these you know some some you know H in the middle right so say the truth is a right so if I applied H in the middle that correspondent to a rigid motion and maybe also a scale change then the 3d points would look pretty much like the 3d points that I had originally just looked at differently and maybe bigger or smaller right so this wouldn't be so bad but the problem is that the effects of this H matrix on the 3d world could be even worse than that right so there are things that are kind of equivalent to remember we talked about shear transformations or basically it's like saying okay I push everything along one of the XYZ axes so that's kind of like what I would get if I push this house around along the 3d axes that would look pretty strange and unfortunately you can make even bizarre things that come from these projective ambiguities where now you know straight lines that I hope we're going to be straight and perpendicular are no longer straight and perpendicular so you get very weird-looking deformations of the 3d scene and so the problem is that you can't really tell unless you apply some more information like for example we'll talk about this more next time but we'll talk about how you can go from something like this back to something like this with an assumption about the fact the cameras are not allowed to be too weird right so one way of thinking about this is that some of these weird combinations of px correspond to camera matrices that we know we would not physically tolerate for example they may have pixels that are not square or they may have aspect ratios between pixels that are not one-to-one you know stuff like that so we could use some of those real-world constraints to bring our life back to the good position okay but for their wonder stand is that there is this kind of ambiguity that we have to deal with okay and that ambiguity for the moment we can't really do anything about so let me just give you a formula tells you kind of like what's the best that you can do if I gave you the fundamental matrix what would be like one pair character sees that would work okay so so one possibility for getting you know the two camera matrices back from P or back from hath I'm sorry is the follows is the following so I can have the freedom to fix my one care matrix at basically the simple scare matrix that I can have right it sits at 0 0 0 and it looks straight ahead and has no as the identity matrix as its calibration matrix the other one looks like this so here again I'm using this little cross notation that I talked about before so what does this stuff mean so here this is any three by one vector and this is any nonzero scalar so if I think about this all the stuff in this equation I can get from the fundamental matrix right I know F presumably I estimated from looking at point correspondences I can estimate the F opposed in both images from the fundamental matrix those are basically just like the eigenvectors that correspond to the zero eigen value and then i have basically four degrees of freedom where i can plug in some other stuff if i wanted to i could just say okay I'm going to use V equals zero zero zero and lambda equals one that would give me a pair of care indices that would produce the same fundamental matrix right and here I can see that I still have these four degrees of projective ambiguity that I don't know how to resolve okay so you know like I said there's not much you can do if you wanted to you could try to choose this vector here to make the right hand or the left hand 3x3 block of this matrix look as close as possible to a rotation matrix there's some details about how to do that in the book but you know really unless you have more information this is about as well as you can do without knowing anything else okay so in real life we often do know a little bit more than this right so one thing that we usually know is we can calibrate the internal parameters the K matrices of these cameras using this checkerboard algorithms I talked about last time right so life becomes a lot easier if we know K and K Prime already right then the only thing we have to estimate is what is the rotation and translation offset one two camera two so we can you know if we know K and K prime already and like I said we may be able to get this from plane based calibration then their life becomes a little bit easier so what I can do is I can basically transform the image coordinates by kind of undoing the camera matrices right this is life say okay you know undo the effects of the character sees then the two character sees they're in this particularly nice form where the two camera calibration matrices become the identity majors right so then all I have to do is estimate R and T okay and it turns out that again I apply this kind of you know normalization to the image coordinates and now I has to make the fundamental matrix again between this new image here right so if I estimate the fundamental matrix using these so-called normalized coordinates right so I can still get this matrix F and in these particular circumstances when I know the camera calibration matrices this F has a special name which is called the essential matrix may it's still fundamental matrix but I call it the essential matrix when I happen to know the K and K prime okay and it turns out in this case I can really nail down what is the rotation and translation if I you know if I ask you mate this then what I can do is I can take the singular value decomposition of F and then there are basically four options that I can get so if I have this singular value decomposition of F then for example you know my p prime is one of these four options so one of them basically looks like let me just write this down first and I'll explain what it means this w is basically a special matrix made up of zeros and ones and let's see you three is the last column of you so the idea is that all the stuff I can get from the fundamental matrix right I take F I decompose it into the singular value decomposition and I obtain basically care waitresses that have to do with the SVD of F this weird matrix W and the last column you write all that stuff like an estimate from the fundamental matrix alone right and so this tells me that I've gone from a family of four degrees of freedom down to one of these four options right so not four degrees of freedom but literally I only have four possibilities and furthermore I can tell I can resolve which of the four it is by what I would call triangulation so what I mean by that is that you know kind of these correspond to care matrices that kind of may have fundamentally the same locations or offsets but they may be flipped in a different direction so kind of what do I think about this is that you know I have basically four options either you know maybe I have these possible pairs of image planes and there's only one pair of image planes such that if I were to project through the image planes and figure out where the points intersect they will be in front of both of the cameras right so excuse me so basically you know I can know I've chosen wrong by saying okay well suppose I had like this candidate and I looked at where these points intersected well I'd say hey you know this would be behind this camera and so I know that that can't be the right choice right so fundamentally all you have to do is shoot some rays through the image correspondences that you've got and make sure that they're all in front of both cameras that will resolve for you which of these possibilities so in practice you know it's not really that hard to calibrate a stereo rig because we already kind of knew how to calibrate for the injurer internal parameters of the camera with this plane based calibration right so kind of the overall pipeline would be you know you mount your two cameras together originally you screw them down so they never move then you show both cameras the checkerboard you know you can probably do the same checkerboard you know if the cameras are not too far apart you can get a way to show you the same images you use the methods we talked about last time estimate the K in the K Prime then you estimate the fundamental matrix between them with a set of feature correspondences which again shouldn't be too hard to obtain you could have gotten an actually from the correspondences on the checkerboard that you show it the first time for the further calibration and then you kind of looked at the SVD event fundamental matrix and you applied this thing to get our MT right so really the whole thing is not that hard if you've got your nice calibration test pattern right so that's not so bad in theory you could do the same thing without a you can do it without a checkerboard by basically you've got your your cameras mounted on the rig and you can kind of move this camera rig around the world and look at the pictures that it takes and if you look at the feature tracks you kind of use those as surrogates for the checkerboard corners and you can kind of guesstimate all this stuff at once if you wanted to put a mean in practice it's not hard at all to just show it some sort of a test pattern that you have in your bag that's probably easier and faster than trying to get it from that teachers okay so that's the that's the quick overview of stereo calibration again if you buy a stereo camera you generally shouldn't have to do this yourself because like for example I bought a let me to see here if I have a picture of it so I have this point great camera called the bumblebee right so here's an example of a stereo camera where this thing is already inside of a rigid housing and you know you should never have to recalibrate this camera because there's no way these cameras are physically going to move right there really bolted together I mean if you were doing it yourself like if you had a kind of DIY you know camera rig that you had built especially for some sort of a production and then maybe you put in storage and Wiggles around and so on maybe then you might have to do occasional recalibration but in this case you know I think you can probably be pretty happy with the calibration in fact I also have a seer actually the one I have is got three cameras on it right it's the sky Oh where's my I saw a picture of it before but then I click that I went back anyway hopefully you can see here that there's this sky xp3 right so I've got this guy actually so actually it's got you could think about as three potential pairs of stereo cameras all in one thing right so if I wanted to I could use the shorter baseline or I could use larger baseline or I could use all three cameras at once right okay so many questions about stereo rings tell like we talked about last time you know the the way that a stereo camera works or looks in Hollywood is much more bulky than this and let's just see if there's a I haven't prepared this so let's see what it looks like right so you know here is a good example I think where you can see that the two cameras are actually mounted off you know they're kind of outside at right angles and then there's this kind of half-silvered mirror that is passing the images through and you can still calibrate this camera exactly the same way because the effective position of the cameras is the same as if they were on a narrow baseline together like this but the physical configuration of them may be quite different right and so yeah I was gonna say I was just I was just going to say that Erik Amur is over at EMPAC has done some stuff with stereo filming and I know that he's made his own kind of do to yourself stereo camera and I'm pretty sure that he did like this right you know obviously you can imagine this is not the kind of thing you could just kind of knock together at home without having a few specialized parts or especially like the half silvered mirror is something that you know you can't just like buy a couple hundred ollars of the parts that get to work but you know impact can afford to do that right okay so the things I really wanted to well so the main thing that we got to talk about in this chapter is I don't really care about just like a pair of cameras right typically I care about the path of a camera as it moves around an environment right so maybe have a shot that contains a hundred frames right and I wanna know where was the camera at every point of those hundred frames right so really instead of having like P and P Prime I have P sub K where K is the K position of the camera right so kinda what we're getting into next is what I call image sequence calibration and so like I talked about in the very beginning you know the heart of this problem is this computer vision problem called structure for motion so you know I guess you could argue about like there may be some fine semantic differences between structure promotion and mass moving and camera tracking in my mind they're all fundamentally the same problems so I'm a little bit hand-wavy about it and so generally the way it works is the following so big picture is that leaders first say the steps and then I will talk about what they mean so projective reconstruction is basically the first step what that means is that you know I have a bunch of observations of the ice point in the jet camera right and I know that they are related together by the IAR matrix and the 3d positions of the J point right and what I'm going to do is I'm going to try to estimate these peas and exes to make the difference between this observed projection and this combination of things on the right-hand side as similar as possible okay but the problem is that again I can insert this you know weird matrix in the middle of this and still get very good imagery projections right so projective reconstruction kind of gets me there numerically but doesn't get me there practically right I mean I need to know more to be able to solve the problem I use self and visual effects so the next step is what's called Euclidian or metric reconstruction and kind of crudely what I would say here is this is basically using facts about form of the camera agencies right so this is where I would say okay you know I know the camera matrix can't be this wild matrix it has to have a nice K it has to have a nice rotation matrix and so we'll talk about that part more in the next lecture and the third one is what I would call bundle adjustment and kind of the way I think about this is I parameterize parameterize T and X and then I do some numerical optimization so kind of one way I think about this is that these two phases I think are kind of like initialization of an optimization problem right so the optimization problem I want to solve is really called bundled wrestle and so let me be a little bit more precise on what I mean by the optimization problem so the key optimization problem aka bundle adjustment is well I know that I have my model that I have these cameras and I have these points and so here I may have you know M cameras and I may have n 3d points and I also may have like little indicator variable that is you know one if point J is seen by camera I and zero otherwise right so it's it's possible that not all the three points are seen by all the cameras right and so certainly one instance of that is when you know if I'm moving camera around the scene you know some of the points are going to fall off one side of the camera and new points are gonna be generated on the other edge of the camera and so naturally not all the points are seen by all the cameras at once question yes it's a singing it's assuming that the scene is mostly static right so you have to make sure that you choose your 3d points on objects that are not moving right so you never would want to do bundle adjustment on like if you're doing a street scene you don't want your points to be on the pedestrians in the cars you want them to be on the buildings on the streets right so yes that's a good point assuming a static scene and so what I want to solve is basically I want to minimize in some sense the sum over all the cameras and the sum over all the points of the distance between the IJ and the point that I would get if I had this hypothesis for camera I and this hypothesis for point J I only need to minimize that distance over places where I actually saw that point in that camera and ionization has taken over all the points and all the cameras at once right as you can imagine this is some really big minimization problem right I've got maybe hundreds of cameras and maybe thousands of points right there's lots and lots of variables in this problem and that's why initialization of the problem is so important so if you've ever done any sort of MATLAB numerical optimization you know MATLAB will will provide you tools like gradient descent and so on for minimizing a cost function right but the problem with those tools is they always or they only can find local minima which means that you want to start your optimization problem close to the truth before you get started otherwise the thing that the numerical optimization they come up with may be locally good and may be far from the global best thing right and so initialization is really important that's really what we're going to start talking about today so this thing here I basically call the reprojection error right this says I want the distance and so this D here kind of lives in the image planes of the cameras right says in image coordinates how far away is the observed point little X IJ from the projected point that I would have gotten if camera I was here in camera and point J was out there right I want to make those as similar as possible and so one reason you know that may not be clear why this is called bundle adjustment is the following this may make it a little bit clearer so what I have are correspondences right I have I know that the black dot matches itself in these three images and the same with the gray dot and the light gray dot right and so I don't know are the white dots which are the camera locations and orientations and I know these big dots out here which are the actual 3d positions of the points and so what I'm trying to do is I'm trying to move these white dots these big dots around so that when I look at the set of rays that connect the white dots to all the big dots in the scene they coincide as well as possible with where they should be on the image plane right so you can kind of think about this process as adjusting these bundles of greatness right so this so the idea is that this set of rays emanating from this camera center is a bundle right and I'm kind of moving this bundle around to make it match up with the image projections that I saw as well as possible all right so it's kind of a weird name but this is the idea these are the bundles in bundle adjustment right so that's the main kind of concept so questions about what we're trying to do yeah you're still a little skeptical no oh hey all right okay so let's talk about the first step in this process I can't get through the whole process today but I want to get through the first step which is called projective factorization okay and so let's start with an assumption so initial assumption let's suppose that all 3d points are seen by all cameras now that's certainly not a realistic assumption but one thing that I could do for example is like a kind of keep on throwing away points and throwing away cameras until I get a small set of points that is seen by a small set of cameras so maybe I don't have all the points and all the cameras so they were kind of the way I think about this is it's like saying okay you know suppose these are my camera positions and then these are my 3d points out here so maybe you know since this point is not seeing my disk camera and this point is not seen by this camera maybe the first thing I do is I say okay I'm going to choose some subset of the cameras and some subset of the points such that all the cameras see all these points initially right so this is just kind of to see the problem right we'll talk about how we can fill in these cameras and other points later in just a second okay so what that means is if I go back to my previous slide what that means is that basically this indicator variable is 1 for all the points and all the cameras because I see everything okay it's too late I've seen everything so now what I want to do is I say okay I've got the system of this right I know that every point is seen by every camera another way to go out this is that I can replace this tilde with an equal sign by saying there is some scale factor lambda which I also don't know that makes this inequality right that's like saying that if I knew the kind of scaling between what's on the right hand side and what's on the left hand side that we represented by this lambda and so this lambda is what's called the projective depth of a point so now what I can do is I can form up everything that I know about these X's into a big matrix right so that's like saying okay well actually maybe for my own ease I'm going to move this lambda IJ over here it doesn't really matter where I put the scaling factor so I can rewrite this as a big matrix here so this is going to be like lambda 1 1 X 1 1 lambda 1 n X 1 M lambda M 1 X M 1 lambda MN x mm so it basically collects all the stuff that I have right so this is going to be some massive matrix it's going to be well each of these guys is going to be a 3 by 1 vector and I have M of them so it's going to be a 3 M by and then I have endpoints so it's going to be a 3 M by n matrix we call this the measurement matrix and then the nice thing about this measurement matrix is that even though it's really big it is produced by knowing each of the cameras and know each of the points right so I can write it like this where I can say if I were to take the kenner matrices and the points that what's on the left hand side should exactly match up with what's on the right hand side so here this is a three M by four matrix of cameras and this is a four by n matrix of point locations right and I can see that when I multiply these two things together I get this three M by n big measurement matrix okay and so this is like a key observation right this is like saying that even though I have all this data right if I were to form this matrix you know so again this kind of comes back to the some linear algebra so there's something called the rank of a matrix right the rank of a matrix basically says how many linearly independent rows or columns can I have in the matrix right so in this case the rank if I didn't know any better could be very large right because M is probably big and n is probably pretty big but when I write it in this way I can see that the rank of this matrix can be no bigger than four right because it's made up of something by four times four by n right so you know so ideally rank is at most four right and that observation is what leads us to do this projective factorization idea so what I can do is I can say okay let me call this measure a matrix M okay well maybe M is not a great word actually now I notice in my book that I probably shouldn't call it em let's call it W ok so again this is our singular value decomposition things so I can take the SVD of W right this is three M by n this is n by M this is M by 4 right that's how the whole thing gets put back together alright LASIK nope wrong this is three M by n this is n by n this is also n by M my mistake but this D is supposed to be Daigo right and if I look at the entries here of D what I should get is some stuff here in the first four diagonal entries and then I should get you know some very small numbers right because ideally this measurement matrix is formed up by something that there's only this 4 degrees of freedom so here what I should get our four you know non zero values and then I should get basically n minus four almost zero values and so what I can do is I can make a new matrix let's call it D 4 which is just equal to the four nonzero values and then zero everywhere else okay so this basically say okay I know this should be it should almost look like this I'm going to force it to look like that right and now I can make a initial guess of this matrix of peas as you d 4 right this is going to be again three M by n actually maybe over there say that D four is like this right d 4 is going to be just this M by 4 matrix and this is n by 4 this is going to be 3 M by 4 and in my x collection of all the X's is going to be basically the first four rows of the transpose so this is going to be that's going to be 4 by n right and this way I can basically form up candidates for you know initial set of P is an initial set of X's right so this is a way to say okay again okay what do I think about this is what would be a set of peas in a set of X's that are consistent with the measurement majors right now again due to you know noise and the measurements you know incorrect correspondences you know these numbers here may not be exactly equal to zero but you can set them equal to zero and say this is the best that we can do okay and again just in the same way as we talked about before yeah there are some little things you know nitpicky things you should kind of normalize the points the to D you know correspondences first kind of similar to how we talked about this when we were estimating things like projective transformations you kind of want to Center them to be at zero zero you want to normalize them so they have distance of square root to the origin and stuff like that the other thing I didn't kind of mention is that you know in practice you know you know I kind of assumed that I knew these lambdas right so if I look back at I look back at this thing here I can't even talk about doing the SVD of this matrix until I know what these lambdas are right and so what I could do is I could initialize with all the lambdas equal to one and then I can do you know solve for you know the pis and the X J's then I can compute the new lambdas because now I have this and I have this and I have the observed little X's so I could figure out what would be the closest lambdas that would make those two things match up and then I can iterate until things stopped changing and so there are some you know details about how this whole process might work but you know it's not too hard to code up and so this is called the Sturm tricks algorithm and so the overall algorithm is given in the book there are a couple little extra steps of normalization errors and stuff like that so another way I think about this is that you know kind of bootstrapping my way up or annive iterating until everything stops changing right another way I could do about another way I could think about it as an alternate approach you know I could seesaw back and forth I can estimate the P is from the X IJ's and estimates of this and then I get good from the X's so it is that if I know two of the things the third one is easy to estimate right and so maybe that may be a better illustration is like this right so here this is kind of like saying okay so let's suppose that I knew maybe this is not the best picture but let's go back to this so if I knew the cameras and the corresponding image points that I could figure out what would be the corresponding X's out in space by triangulating right I kind of push all those rays out into three space and find out where the Rays Chorus Line each point intersect right that would be kind of like this picture right so if I knew the camera and I knew the point I can find the 3d position right kind of conversely if I knew all the 3d point locations and I knew the 2d point locations then I can figure out where the camera sees were that's just this recessional from last lecture right so kind of what you're doing is you're alternating resection and triangulation both of which are relatively easy things to do okay so questions about the big picture so again at the end of this process I still have this ambiguity I can still get very weird keys and exits right and so normally what you get out of the end of this process is visually useless okay in the sense that if you were to observe where these X's were in space you would get some sort of a bizarre-o set of points like Figure D here right and so they need to do more to make your character season points match up with what you have right but this is the first step all the information that you really need is inherent in the P and the X that come out of this step right the next thing is to say okay now I'm going to try and find the best you know tweak to these matrices with this kind of H that I put in the middle to bring the camera agencies and points of something that I believe it's true so one more thing I want to mention and again there's a lot of there's a lot of work to really do this robustly and to do it well right so for example like I said the instruction promotion packages that you guys are going to be playing with and that the visual effects people use every day are you know based on years and years of research about how to do this problem the right way and so there's lots of the picky stuff that would be difficult to build your own matchmover that would do as well as Buju or cynthaiz or PF track or voodoo learning those things right but one thing I do want to say is that so far everything that I've said doesn't really take into account the fact that these cameras are coming from a continuous motion of a camera right so right now these P matrices as far as this algorithm is concerned it could have come from anywhere right there's no sort of continuity between the cameras and so we can exploit that so this algorithm is talked about is good for the problem when for example I take a bunch of tourist pictures of the Statue of Liberty and I want to know where were the cameras in that case there's no sort of temporal relationship between the pictures that were taken right I have no expectation that the cameras could follow some sort of a nice smooth path but in real life you know our camera is moving along with some trajectory in 3d space and I'm getting like nice little samples of the camera right and so I should exploit that fact right and so that's talked about in the section called sequential or hierarchical updating again this is really where the secret sauce is but I would call this kind of like generally some sort of chaining approaches right so what I might do is I might say okay you know suppose that I'm going to so here's here's a crude idea right so what I could do is I could say okay I'm going to start with these two cameras right I'm going to estimate the fundamental matrix between those cameras and I'm going to use that to get a pair of camera matrices right P and P prime or P 1 and P 2 right and now I say okay well you know now I saw a bunch of points in 3d space and I know this P 2 and I know where those points were in image 2 so I can basically you know build Ray's out this way this camera that I haven't looked at yet says ok well now I see some of those same points over here so I can project those guys back down and I can estimate P 3 right and then I kind of bootstrap my way along I can't have keep on resection and triangulating cameras to kind of sequentially estimate these cameras as I move along right and the idea is that if the cameras are not moving very much that estimating the you know correspondences and then doing the pushing forward and pulling back process shouldn't really be that hard to do and so kind of what you're doing is if these are the corresponding images that you get what you're doing is you're saying pay I'm going to first work on this pair of images then I'm going to work on this pair of images then I'm going to work on this pair of images and so on right you kind of push your calibration forward as you go along the sequence one part where that gets a little bit complicated is you know it's certainly possible in a movie shot that I may move the camera around then that maybe the camera slows down but that speeds up again in terms of physical path and so instead of just kind of moving along pair by fair pair another thing that I could do is say okay I'm going to choose you know some subset of these cameras so this will be what I'll call sequential approach what I'll call hierarchical approach is to say okay I'm going to choose this is my first camera and I'm going to choose this camera but here is my second camera and this camera over here is my third camera right Vidia being that try and choose these cameras to be kind of far enough apart that I can robustly estimate everything so one issue is that when the cameras are really close together then it's hard to distinguish the camera motion from a pure rotation right so basically you know if the camera is not moving very much and there's only a thirtieth of a second between the camera cranes well then you know there's not going to be that much difference between a rotation translation of the camera versus just a pure rotation it's hard to tell those things apart right so instead what I would say is okay I'm going to choose pairs of frames that are substantially far apart that I would never confuse with pure rotation I'm going to estimate their 3d positions and then I'm going to fill in the gap between these guys with triangulation and reset and so on right so that kind of idea is probably going to be more successful than pushing it one by one just because the you know the ambiguities and the errors that you get when you're just pushing a camera image at a time maybe you know not as good as if you were to say okay I'm going to choose a priori the best cameras out of my whole set and I'm going to fill on the stuff in the middle right so this kind of key framing approach is generally probably a good idea and so you know when the camera is not moving very fast maybe you could take those key frames to be hundreds of frames apart right as opposed to just taking every other you know every frame right the other thing I'm going to mention here only in passing is that just in the same way that there is this fundamental matrix that relates every pair of images there's also this concept called the trifocal tensor which sounds fancy but basically the trifocal tensor is something that relates triples of images so like f but relates triples of images right this thing is a lot harder to work with certainly on paper because you know instead of having this kind of relationship between correspondences and image 1 image 2 you've got this kind of mutual relationship between correspondences and image 1 image to an image 3 and that's represented by this kind of 3 by 3 by 3 matrix so it's hard to visualize and hard to talk about on paper right but it is good for basically so it's kind of more robust in the sense that you know it ties you know the triples are tied together kind of more tightly than the pair's are and so you can kind of apply the same idea as a sequential thing except I would use the trifocal tensor between every triple loop frames right again that process mathematically is like a little bit hairy to write down and talk about for you guys but that also is a very viable approach these kinds of problems and so there's a very famous paper called threading trifocal 10 series that talks about this ok so this is the idea and I think that at this point so it's kind of where I'm going to stop for the day but basically the idea is that at this point what I have is we can assume that I have a good estimate of the camera matrices and a good estimate of the scene points so that the error between every projection and it's kind of model is low but we have these ambiguities right if I were to take all these camera trees and multiply them by this and all these scene points and multiply them by that where this is some 4x4 non-singular matrix my distances would be exactly the same I wouldn't be able to disambiguate which of these things might be correct okay and so the next step is what's called Euclidian reconstruction which basically means pushing our way up to something that we trust as a reconstruction and so instead of sorry again to that now I'm going to save that and talk about sorry next time okay so any questions about this first phase all right so let me should stuff down here

Info

Channel: Rich Radke

Views: 27,246

Rating: 4.9080458 out of 5

Keywords: visual effects, computer vision, rich radke, radke, cvfx, camera calibration, stereo rig, stereo rig calibration, projective reconstruction, structure from motion, matchmoving

Id: DDjfhYxqp3w

Channel Id: undefined

Length: 61min 7sec (3667 seconds)

Published: Mon Mar 31 2014