Lecture 15: Structure from Motion

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so we're going to talk about this new topic called structure for motion and this falls under a bigger topic called shape from X so the idea is to recover the 3d shape from one or two images and this is a very big topic in computer vision as we've been talking about the images are 2d the world is 3d you get images which is projection up to 3d and 2d and you want to recover 3d information and there are many methods to do that stereo is one between the rock or tomorrow in motion which we are going to talk about today and then shading and photometric stereo texture contours silhouettes and lots lots of different ways to recover 3d from 2d images and that's what humans do because our our vision systems also come in projecting 3d to 2d so retina is like mhm plain and we are getting 2d but we are able to recover this 3d and we are able to navigate in the 3d world and that's important so the applications are many if you can do the 3d recovery then you can do object recognition and recognize objects in 3d you can do robotics you can put a camera on the robot can mobile can move around navigate follow a path and you can do computer graphics you know you can have realistic videos I remain three because you can you know generate new viewpoints you can do image retrieval you can do geo localization you can find out what is the exact location that picture was taken it is using archaeology that you know you can and try to understand in the old days when these particular cities which were ruined destroyed you know what used to be the culture what you to be structured streets and so on so there are lots of lots of replication of that and of course the sports you can use 3d in sports so one good application is this Kinect Microsoft Kinect you know many up to use use Xbox and the Kinect helps you to get this 3d because it has this 3d sensor which is not a vision based is some different sensor but that's what it's able to do that you know so these are the examples that these are the gestures which on the on this row we are showing you these are the depth images which is the 3d images which shows you the intensity or how far are you from the camera okay so this area is further than the person and this is closer than the sky and so on and these are the simple RGB image which are the color images so Kinect will give you both of those and there's a lot of research going on to how to use these kind of videos and it's called RGB D RGB and depth the 4d data you are getting when you include time because you have video a tight every time instance you have RGB T okay so so these are some of the examples of different gestures very interesting data set where you can the challenge is to recognize these different kind of gestures and these are gestures normally you use to when you do the video games with the Xbox so instead of using the controller no you want to use the gestures to do this very realistic kind of experience okay so and these are you know some more of examples of these different kind of gesture these are the depth and these are the RGB images but there are corresponding depth images to be analyzed and what the the the Kinect is providing that in addition but the RGB is providing your depth images also and so that's one so as I have shown you this thing that humans have different cues to recover 3d and stereo is one emotion is another one shading is third one lots of them and what we perceive is integration of all those cues now if we mask out all other cues then just motion is still pretty powerful cue to recover 3d and so this is example of that which is called moving light display so so imagine that you look at these dots here in a single image you don't it can be anything but when you start playing this and you can start to see there's a person which is walking so which is very powerful cue that what we are doing is perceiving this structure in 3d recovering this and just based on very minimum information and all other things are actually masked out which is which is very good and so this is one thing and so what we are going to talk about today that how can we write a computer program which will take images like this and recover the 3d shape and also 3d motion and so that you can then synthesize images from different viewpoint which are shown here and this is a very simple method and it really works very well so the problem is that we are given optical flow or the point cross sponsors we want to compute 3d motion which is the translation and the rotation and then shape information so that's called structure for motion problem ok so there are this problem has been very popular for last 20 years or even more and many many authors have tried to address this problem because it in generality it's very difficult problem even though now they are nicer solution so it started in a way from Shimon Ullman at MIT then there are whole series of people almost every senior person same from famous group have worked on this problem and there's a whole list of these and some of you you have heard about like cannot-a and the same men and so on so there's a whole list of these and even we have done this one of my former PhD student she worked on this structure from motion about 10-15 years ago so it's a very rich area and but recently this has made a lot of progress and I showed you before and I'll show you this again and that this method from the from the Microsoft I think which is called Photosynth has really had a big impact and this is a nice video which kind of explain to you what this method what's close though this linking images together whatever images are taken in a calm environment it's as if you fall hybrid invisible and as an African about the emerging network of hyperlinks images that they it can be little quiet colors say going out and searching the whole the whole then it's a very powerful idea here's a shot saying here's this look of looking at it looking at me through my photos something happens when we arrange all these guys it's a common financial environment here's a point cloud loves to be destructed no solutions let's turn over just Austin's you will was it lots of photos in their own planes inside that mom now move that pose like this just moving from side to side this one boss is there now appearing on the screen are showing where photos are taken so example 1000 over here Rana and significantly finish warm so you can imagine it knows you like this one with many people's what else T vs is we becoming like a three-dimensional map or universe we have good enough reconstructing a life and the lost of course those fellows individually and there we now go around the space either via photos or the urban fire department this is open terminal which is so much we can do this now we will close the center of sweet mobile just share content with that Network he's a goodie and that's all shot here's there's not a close-up of this COIs one gets the workshops we see this o'clock also heard another other files like this one so this is a way of grouping and navigating between images using clear contents without any kind of adding an example choice before this so if i zoom under prosecutor and as you zoom only the necessary we share some of them or is about I say if you this one and you wanted to know what's in one of those girls another phone does you discover the like that this color would members of our society it certainly gives you a way of looking at other perspectives on something or close-ups of what's around the corner based outside image let's say that this close-up is out of occasional thoughts about this particular scene okay so so you got an idea that you know even though this is doing you know many different things and so on but basic fundamental problem you're going to talk about given two images how we want to recover the 3d information okay so so we're going to talk about this particular algorithm by Tomasi in Canada in Canada is same the KLT tracker which we are implementing and in that case cannot a L is Lucas and T is Tomasi actually so he also contributed Tomas he also contributed that tracker so so the assumptions are that the camera models orthographic so this is much simpler model as compared to perspective but it's still they will show that they can recover 3d you know as you know the in general model is a perspective as compared orthographic but perspective is nonlinear because a depth appears in the denominator does create a problem so we are going to assume that we have P points you know and we have the F frames upper case F frames and F has to be at least we have more than you know three frames three or more frames and these points for which we don't recover 3d they are not in the same plane if they are simple and and be a problem and of course these points in the images they have been tracked so you have used your KLT tracker to get these sift points are are the harris points you get a tracks and we are going to use actually those to do this so then this is a batch mode that we have all the points of and all the frames your tracks of these images and we are going to use all of those to recover the 3d and also the motion from every consecutive frame okay so and we don't require the camera calibration if we accept that this 3d structure will have up to scale factor that you know if you have same object but it is scale and you take a video of that we will get the same semi structure but you won't be able to table this is smaller one is a bigger one so that's one one problem because of orthographic projection and camera calibration not that much is fine so so input is frames like this that's showing the four frames and you are showing the decal t-tracks what you will get you know which you are getting in your program so what we are going to do is Kelty Trek's will give you an image point the location of these points in different frames okay and we are going to represent them by U and V these are the X&Y coordinate there are not optical flow they're just notation they have used this one so we will stay with that so you just won't be aware of that so here we have the these are the frame numbers small F from 1 to F and then P is points so we have the x coordinate which is represent by U of the point P in frame F and y coordinate of point P and frame F so that's a general notation is going to use so what we are going to do is we'll take these tracks we will put them in this matrix which is W and this first row will be the X location of these points in the in the first frame so this is a first index is a frame number and second index to the point so this is the first frame all of them so the first point second point third point and P point this is the first row and here the first row is B Y coordinates and similarly we will have a second row which will be the second frame third frame fourth frame and this is the last frames FM is the x-coordinate and is the y-coordinate so we are going to organize these points like that simple is a matrix okay so what you can do you know take output from your Kelly tracker which you can generate this matrix okay and we will represent this matrix in the short form that we have a purpose of part of this matrix is U which has the F rows and P columns and lower part of the matrix is another F rose and P column okay so that's got W matrix so first thing we are going to do is normalize these coordinates image coordinates of these points finding the Everett the mean in every frame so we take these the x coordinate of all the points B and add them up divide by P so the average of the x coordinate and frame F we're going to do this for all the frames and similarly we are going to do this for the Y coordinates so then we are going to subtract the mean from each of these coordinate of these points know which will call Delta u into tau B so this is the point P in frame F and this is the x coordinate this is the y quad very simple okay so we you know we will remember this is the equation a so so that geometry is very very easy and very intuitive so the idea is that we have say point P in the 3d once about there's a point P that camera eyes you know that's a point then we have a 3d coordinate system now XY FC no suppose you know my origin is there this is you know X this is y and then this is Z for example so I can find out the 3d coordinates of that point P with respect to this what coordinates then I'm taking a video I am taking pictures these are the pictures and they are a time different time time T time T plus 1 D plus 2 these are the pictures and then the I have a coordinate system attached to each of the image the coordinate it of the image our camera so these are the coordinate system of my images and I have the coordinate system is I J and K these are the unit coordinates and I will have like this for every frame so that's why these have index is f so now this coordinate system has origin here and there's a translation from this coordinate system the word coordinate system which is translation for the frame F will be translation from frame F plus 1 F plus 2 and so on so that's the geometry and very simple idea to capture this and we are going to use that ok so now we have a 3d point in this step coordinate represent x coordinate like one z coordinate that's you know three numbers then we have these unit coordinate systems which are paid to the frames each image we take from the camera and they are unit vectors and as you know that if I take any of two unit coordinates find a dot cross product then I'll get a third one that's the idea of the XYZ now if you take you know 1 0 1 and 0 1 0 find the cross product of that you will get to see that's what you have learnt the vectors so therefore I is the unit quadrant for x axis this is our Y axis this is the fuzzy axis this is a fact that's why you want to have this coordinate system so now under the auto graphic projection because as you know there's no focal length there's no Z it's very simple idea so that autograph projection is that you take a point in 3d which is see this one and you want to represent them in terms of the image coordinates which is U and V and what you do you since the world coordinate and the camera coordinate is translated by translation this original translated so you want to get take care of that by taking the 3d points which were with respect to word coordinate you subtract the translation now you have these coordinates in the image coordinates of the camera coordinate then you want to find out the projection of this in x axis which is I and then you get the x coordinate you want projection Y X's you get the y coordinate with this that I cross product J has to be K which is a z axis okay so that's very simple model of optical position as you know the XYZ projector X Y and Y X small X small Y and these are the image coordinate X Y Z upper case they are devoid quadrants okay and so the only difference is the the translation with respect to word or with respect to the image coordinates so that's what we have taken care of by subtracting translation from here okay so now what we are going to do is look at this again so we are these are the normalized mean normalized coordinates of the point P in frame F this is the x-coordinate and these are the original one we subtract and mean from it and and we are going to use here the the what we just expressed that we can express the X image coordinate by taking the 3d points subtracting from the translation of the world origin to the camera and then projecting on the x axis this is what we just explained to you using this figure then we also know that the mean that AF is given by this which is the summation of the x coordinate of the all the points in particular frame and dividing by the number of points which we just explained so then in this one we are going to use this also that how we can express the new in terms of the word coordinates and translation so now this will from here this will become like that so we have by definition U is given by this and then I F which we just use here but here we have more points we have actually P points so we use that and just not confuse we just using different index Q to sum them up and so on but this exactly same is here just we have used a definition of U which we use here as we explain how we caught that so now we have two terms okay so one is that we have the M we have this if' T multiplied by TF and we have IFSP from here and from here we have if' sq and then I have multiplied by TF now if you look at the this last term this you are finding dot particle I with TF and this is the P times uppercase P time and then we are dividing by P so we will get actually I FD f you know because there they are Pete number of our times we are summing this up and then we divided by P which means the summation will give you P multiplied by i FD f divided by P will become same I have TF which is here so this will simplify that we will get rid of from here this will disappear and this will disappear so from this one we will get SP and then i F is outside and from here we will get s qi f and 1 upon P the simplification of this okay so now one thing we are going to zoom here that we can select the word coordinate which is at the mean of the the mean of the 3d points which we have so which means that if we have the origin of the world coordinate such that it is located at the centroid our mean of the word points then this actually will become zero because that's the origin if you sum them up you know means you know zero that's why we picked our region here so therefore this term actually you can disappear depending on how we select the the origin of the world coordinate which is which is pretty easy you get the all the world points you just find that me mean of them subtract from each of those so become zero basically here so then all this simplifies to a very simple expression saying the mean normalized image x coordinates are given by by this take the 3d points and is projected on the unit vector in the x direction and will have same thing for the y coordinate there will have J instead of I here okay so that's what we have these are the x coordinates the y coordinate and which are obtained from the 3d points and using these the unit coordinate for that particular frame and that's why the geometric describe so we can write down this in terms of again this W Delta matrix as we did for W and there we were putting unnormalized coordinates U and V now here we're putting the normalized coordinate util down W tilde which is fine which we can do that okay so now and then this W tilde will be that you know we will have this U and this is first frame second frame third frame the last frame first pine 2nd pine and so on these are the x coordinate the y coordinate same thing as we did before for W but here now we are normalized which is Delta we are subtracting them mean from from those corners okay so so you know that's same thing it said you before so now this is important here and you won't listen to scare for you so if you look at these two equations what we have is the W Delta which is you tilt and we tilt up because you tilt eyes x coordinate and with v2 ties the y coordinate then we can take all those points which we have the P points and F frames we can take this equation and actually put them like this as you see that what we are doing here if I take the first point and multiply with the I 1 I will get the first point coordinates X if I take the first point multiplied with J I will get its Y coordinates so since I have P points and F frames so these are the peopIe in 3d and these are the basically the rotation matrices for these different frames if I multiply this with that I am going to get this and which is follows these equation which is saying take the 3d point put checked in the eye you get x coordinate 3d point particular J you get y coordinate this is true for all the points and this is this is actually basically saying that to using this thing for all the points like that so we have one quick column which is this one and this says the the different X&Y unit vectors as I showed you before in the figure for every frame and the second factor second matrix is the 3d coordinates so these are the unit coordinates what every frames the I and J for the first frame will I and J for second frame third frame and for trim and so on and then to tell you how the camera is oriented with respect to the world coordinate okay and we just need to know to I and J because if you know I and J we can find out K how how can you find out a cross product so that's the idea so that is really neat okay so that's what we are getting to and it's a pretty simple manipulation but the amazing thing is that it really works and that's why this paper is up more than 2,000 citations you know it's a really good paper especially for learning I mean this is you know long time ago but for understanding and learning this is a really good paper there are many better methods now but this is a really good good way and it shows you in order to solve the real problem in computer vision you need to use some mathematics tool which people have known for many many years but you just apply fit in there and you can solve the problem so therefore it's important to know the basics and then the math otherwise you will have a lot of problem okay so this I think formulation FW tilta like this that we have two components one is the R matrix which captures the rotation information of all these frames with respect to world coordinates and which basically consists of a vector and J vector and then the 3d points of these people 3d coordinates or P points so that is essentially a structure from motion problem so we don't know these so we are given W tilta because you have Kelty tracks you have the points of each point in every frame you have those tracks you can compute this as you computer W you can find them mean subtracted you can keep your W tilde also so given W tilta we want to find out the or matrix and the s matrix that is the problem is or is a rotation which is the motion with respect to the world coordinate and the S is the 3d coordinates of these points and there P points there so that's that's what we have ok so now here is again the rank rank idea as we've been talking about that there are very key concepts which are very simple which are used again and again in this course so rank is one of them we talked about rank idea in the fundamental matrix ok so here will be another use of rank constant so now this is saying that the S is a matrix it will have a rank and R is a matrix 11 Council now s is a rank rank of s is 3 because it consists of the 3d points of these peoples and we need three numbers you know it you can have only three linearly independent coordinates lives and anybody vectors so that's why it is 3d points it can be less than three if these points are in up one plane let me you can have two but we are assuming they are not planar not coplanar so therefore any points in 3d you can describe the three numbers X Y Z so which means the rank of this matrix will be 3s so that consequence of that is that ft rank of this is s then the product of this rank of this will be also 3 it cannot be more than that if you have two matrices multiplied together it will be the ring of the you know matrix which will have the minimum ring okay the minimum number so that's one thing so so now they come up with this theorem which is very simple entity is very powerful say the doubt noise the registered measurement matrix will W Delta is at most offering three which I just showed you it cannot be more than three okay because it consists of this so so because W is the product of two matrices and maximal rank of s is three okay so that's very good so now you know just again a quick review as we talked about rank in the phenomena matrix rank is related to linearly independent idea so we have say vectors V 1 to V n these are linearly dependent if we can write down like them and is equal to 0 and if it's all scalars a 1 to a n are not all 0 then they are linearly dependent if not then they are linearly independent so we are going to be interested in how many linearly independent vectors are there in particular vector space so in 3d you know there are three linearly independent vectors you know which can be 1 0 0 0 1 0 and 0 0 1 these are the x axis y axis z axis so now rank of matrix is the column rank of matrixes the maximum number of linearly independent column vectors okay then row rank of matrix is maximum number of linearly independent vectors of that you know the rows and then like that the column rank of a is the dimension of the column space of a and the row rank of is dimension of the row space of a you know we have talked about that just for review yeah so you can also find the rank of a matrix as I told you like this you can do row echelon form like you do in Gaussian elimination backward substitution you take a matrix a and then try to make it upper triangular matrix so that on the diagonal blow the elements blow the diagonals are 0 and above the diagonal non 0 and that 0 is non form and the way to do it will be that you take you know try to take the second row multiply by 2 and 8 I'm going to take the first row multiply by 2 and add to the second row ok so this 2 multiplied by 2 you know 2 minus 2 will be 0 and so I will get 1 2 1 this will remain but this will change does become 0 1 3 because of the separation then you can have another operation technique astro multiply by three and add to the third one okay so that will give you the first one not change and second chan chan but this is going to change and get you zero here and of course the minus three three but you know multiply the first row and you will get here now you already have zeros here then you want to get zero here so then second one is that you are going to add these two rows or two or three well you add these two this becomes 0 this will become 0 also and the first row is this second register now this is in row echelon form because you have diagonal all the elements plural diagonals are 0 and therefore the in this one we can have only two linearly independent vectors therefore rank of this is two so it's a by example to convince you that you can find a rank of any matrix like that but there are lots of quick way to do it in MATLAB ok so now we have a good basis good theory that while we are solving very important problem given the KLT tracks we want to find out their 3d location and we want to find out the rotation from one frame to other frame you want to find out structure which is the 3d points and we want to find that the motion which is the rotation that's the problem ok so now the most important part of this most easy part of this will be how to find translation because the motion consists of both things translation and rotation so we already have equations to find that you know rotation equation for rotation those those I J for every frame so translation we are going to talk about now and translation is actually very easy which will be just a mean of these points in every frame and that's amazing and will show you how you can compute that ok so we are going to start this new normalized x coordinates of the point P and frame F which you have been seeing that and we can rewrite this like this so we take this on one side we'll bring this other side so you have p tilta plus air there's a mean and this is the normalized coordinate X of point P and frame F okay and we also know that this normalized coordinate X is given by taking the 3d point projecting on the the unit vector I okay which we have been using so now we are going to put instead of this one this which is the 3d point and unit vector which is same here and there's the same thing okay and we also know from the C equation that the image in x-coordinate unnormalized are given by that you take the 3d point you know subtract from translation and then project on the eye this is what I showed you in the in the geometry so now we have to kind of definition of ufp one is this equation another one is this equation and we can compare okay so as you see this first term and first term is the same so which implies that AF she has to be equal to this so that's what we get F is equal minus T F which is multiplied by the I F okay so this is telling you your translation D F is a translation vector in the frame F so we are taking the translation vector we are projecting a translation vector in the eye with the x axis and that actually is obtained by the average of the x coordinate of the point which is which is very interesting now so to find our translation just average the x-coordinates you get translation ever the y-coordinate you get translation and Y so that's it so that's the important thing that we can find very trivially translation which is projected an image plane by finding the average okay so that's what we get and then now we can look at this w little differently that we have these equations we are saying the image coordinate X in the frame F of point P is given by take the 3d coordinate of point P put checked and the I unit vector and add this translation AF and similarly this is for the Y coordinates okay so now we can take all these points point one point two point three in all these frames we can write down this W matrix so that's what we have perform or and it turns out that we have the this form that we have this big matrix rotation there is structure matrix and we have translation vector while these frames and this is just once so what we have is as you remember that we have W matrix we have P points and we have F frames for every point we have two numbers x coordinate and y coordinate first part of of that W matrix was the x coordinate so we have F Rose okay and P columns then the second part of was the y coordinate so we have F Rose and P column so therefore the dimension of this W matrix is 2 F by P okay and then we have the r matrix which contains up i j for all these frames and we have we have for each one of these we have 3 3 columns and then we have 2 F rows and then we have in this case the structure we have P points and for each one of there we have 3d coordinates so it's 3 by P we have three rows and we have P columns like that these are the points okay so then so if you multiply 2x by 3 3 by P you get 2 F by P so which is same dimension then we are putting here the translation which is essentially the translation X for frame 1 is given by a 1 translation in x from 2 is given by a 2 and so on this is for the X translation for F frames and this is for the Y translation in the air frames and these other ones so if you take this T vector which is the 2f numbers F for the X and another F R Y and this is the 2 F rows and 1 column and you take another this vector here which has a 1 Rose and P column then you multiply this by that also will become 2 F by P so this is 2 F by P it is 2 F by P and this whole thing is consistent so it's capturing the image coordinates and the 3d world coordinates that are matrix in the translation and all that in this nice you know 1 formula okay and the pure linear algebra is nothing more okay so now you know as we as we said that projected kemon are translation can be computed just for the mean just find the mean of x coordinate and y coordinate is a translation X translation y pure that's no problem ok so now we want to look at this result which says that if there's no noise then W Delta must be of rank at most 3 that's what I explained here now when in this ideal condition that because in this equation now when the noise kurup's damage then we will have problem it may be possible that W Delta may not have rank 3 okay so so we want to extend this theorem that main result which is you know theoretically correct but when you have measurements you know you may have small error an XY coordinate of the scale t points you know then it will create a problem because it's looking but precisely like coordinates and if your image resolution is not enough you may have problem so what we are going to do that there first is that given that matrix W we want to find out the rotation and we want to find out the shape ok so and which means we have a one metric we want a factor in two matrices okay and whenever you have this case you do what is called singular value decomposition you can take a matrix and decompose into three matrices and we have used this again in the fundamental matrix so that's what we are going to do so we'll take the W Delta and decompose in these three matrices and this is as we've been talking about this is to F by P and then this will become 12 by P and this will become P by P this will have singular values and this will become P by P ok so now singular value decomposition is again a very intuitive idea very simple was very powerful use many different places so so the idea is that we can take a matrix a which is the M by n we can break it into these three matrices okay and this not actually you know scribe Madiba can be rectangular matrix M by n so therefore this will become M by n this will become n by n and this will become n by n and MATLAB you can just do SVD given a matrix will give you these three matrices okay which which is very good so of course this is a diagonal matrix it will have only elements on diagonal these are the singular values which are the square root of the eigenvalues and these vectors and this is orthogonal x' and which means you take the two of these vectors and find a dot product if they are not the same they become zero if they are same you become one that's a property of this these matrices so now if we have we don't have a noise then so easy we do the SVD we find the rotation in the structure but when we have the noise then we have to come up with some way to deal with that so what we are going to do when we take the W Delta we just find SVD we get these three matrices okay and if there's no noise then you know we are done but if there's noise then we may have problem so what we are going to do we will take the oh one this matrix and break it in two matrices or one prime and oh one double Prime okay and the wavy on do is we will take the first three columns from oh one will call a 1 prime the remaining columns which will be P minus 3 will call them or 1 double prime okay because the first three what we need because the rank is 3 and other one actually you know are not useful so that's one thing we are going to do and of course this has to F rows so that's one thing and similarly we are going to break the Sigma matrix this gamma sigma sigma gamma matrix into two parts the Sigma 1 and Sigma 2 and like that okay this will have first three column and this will have remaining P minus 3 columns so because it is a diagonal matrix ok so like that then we do the same thing for the O 2 again break in two parts or 2 Prime and O 2 double prime which will have the first 3 rows and meaning p- traitors so that is we are doing to deal with the noise because idle condition we should have these singular values zero but due to noise they may not be zero the rank may not be three so we want to impose the rank that it has to be three okay so then we can express this w tilta in terms of these summation of these two the o1 Prime and Sigma Prime and O 2 prime class or one double Prime and Sigma double Prime and no two prime like that and this is the way we have composed these oh one or two and so on okay it's very simple thing to do so that's what we have now useful information is in the first part this is what we have okay so we are going to use that we just ignore that and that's it then this is the best rank 3 come up Rock summation of the W tilde man that will call W head and that what will tell us the rotation of all these frames and also the structure the S matrix and we are done so saying the best possible shape and rotation estimate is obtained by canceling only three greatest singular values of W Delta together will correspond left and right eigenvectors that's what we did okay so so now to find the rotation we are going to take the oh one prime and under root of Sigma one prime this will give you rotation and the structure will take the under root of Sigma Prime and the o2 Lewis structure and you multiplied together will get this and that's it done so using the W matrix which has your Kelty points you can do this and you get the motion 3d motion rotation every frame 3d translation and every frame and 3d points of these 3d location of these points in 3d which is pretty interesting and a very simple method and and and that's that's it so our tension matrix and this is the shape matrix and this is so one thing here is that this decomposition is not unique and you can you know do something on the top of it to make it unique but we are not going to get get into that but but we will assume this all this is true and we are done okay so let me show you some results and so you will you can actually try this yourself instance you have klt and it will be just one line program in MATLAB so we have these other frames okay and so and these are the points you know about then 400 three feature points which they have detected and then for this sequence they can estimate the rotation the three rotations very accurately now here you know if you look at carefully there are two curves actually and they are very similar so they hard to see one curve is a ground truth other curve is computed and they are really very close so you cannot really distinguish them by zoom in first maybe you will see the dotted curve so the estimate of the rotation one rotation with respect different frames another rotation and then third rotation compared to the crown truth very very close and this tells you the difference between the ground truth and they computed and this is you know 0.1 degrees or something no it's very very small difference or you know this this case so that's one thing and then if you look at here so they can recover sequence they can recover these are the 3d points then a color and this shows you interesting fact that for this house they actually measure but in inches the dimension of these different sides of the house okay so here we are showing the one number is actual ground truth and other number is algorithm give you so as you see they are very very close so here this was 76 and algorithms at seventy five point seven okay this was 84 but algorithm 94.1 this is 53 53 and all these numbers which is you know pretty impressive so you can actually recover this structure using this method and then once you have that then you can use these images to get 3d visualization of this house you know different viewpoint and all this and these are the Kelty tracks which you will have and it's another example and this one I showed you earlier so you take these images and get the tracks and then you can synthesize these images like this so so that's that's it so these are ways here so this is a very nice paper you can look at this was published 92 which basically described this method and also the book by zealously has a nice section of that
Info
Channel: UCF CRCV
Views: 40,405
Rating: undefined out of 5
Keywords:
Id: zdKX7Xo3Cb8
Channel Id: undefined
Length: 54min 24sec (3264 seconds)
Published: Mon Nov 26 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.