Lecture 12: Camera Model

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so let's start them we talk about a new topic today called chimera model and calibration but let me first make sure that I'm have the internet connection ok good so we want to talk about camera calibration and that basically involves that we want to find out the intrinsic and eccentric parameters of camera when you take a picture you know where the camera was located how it was oriented and what was intrinsic those are the extrinsic and intrinsic are what is a focal length and all other related things so X and 6 are three location in the orientation of camera and interestinger focal length the size of pixels and so on so it has a lot of applications so this is one example where let's say this is the image and this we will call a source image and this is a targeted media entities to pictures so let's say we want to take this sign here and we want to put in this image such that looks very realistic that it looks like that actually that sign was already there when we take a picture so this is will looks like that and it can also have shadows as you see now we have a shadow of the sign also here if you look at this picture it looks very DL you know actually it wasn't there so now how do we do that you know in order to do this we need to do talk about this camera calibration came in a model and so on okay so that's one application of this and have some more example let's say so so by the way do you know where is this picture taken which which beach Coco Beach is Coco Beach okay so so this is a long time ago one of my student actually this is his wife and the student himself so now there's another example of the sports image so these two are real images and we want to take this guy and put it here okay and this is what we have and we now have also a shadow and this is artificial I mean this looks pretty real so this is a more example like we want to take this guy here and put it here so pretty R now we have a shadow also and we can do this also in the video and this is the example of a video this is a movie I don't know if you guys have seen this this is I think what five eight years ago which movie say it louder no I don't think it's that this is this is taken from Paris this is on the top of this place in Paris where people go visitors but now what and this is another movie do you remember this movie so you guys don't watch many movies you see so so this movie is I think Schindler's List so what is happening here is this guy is real but the other guy who's just standing there okay so so in this one the little little kid will appear at this guy no so you saw a little kid here this is our artificial I mean this is taken from somewhere else so this wasn't in the original movie that guy the kid which is running and it's pretty nice it looks very real so you can you could do these kind of things a child this little child here okay let me play this again so in this there are two kids the right so so here you will see there's one kid here there's another kid little kid here the little kid is taken from other video okay so this kid is okay this was in the movie now this other kid here that kid we put it from other other video so it should split pretty real not so in the shadow and all this thing okay so this is fun now this also and this one the guy who's running it's a real and this other guy was just standing there it's artificial we just put it there but it has a nice shadow and it looks pretty pretty real okay so so you know it you can do the fun things now the other thing related to this camera calibration or camera model is what is called pose estimation so a post estimation involves that we have a 3d object model and we want to check that what in the image plane say we have a picture then we want to find out what is the rotation translation which we apply to the model so that we can put check it matches with the image so it's a matching from 3d model the 2d image okay because you know images are 2d so actual objects are 3d so now I can take a picture from any different viewpoint from the orientation of camera and so on so how do I know that what should be the pose of the camera pose of the object so that when I take that exactly it matches that with the image and I'm going to show you another video here which is I think interesting so this is the video on the YouTube and explains the application of this kind of thing so I don't know why okay so hopefully this will come up here and this has several thing but you will see here okay so so here it's showing you you know that this is a application object technician and then it's going to show you these different why's it's not going smoothly maybe the internet is not really good here so it's a single object technician so there are these different objects and you want to be able to recognize an interesting thing will be here when you see the robot and somehow the the connection is slow or something so let's let's go for it then you may like that one so do you guys know how to do this and refresh it what do how do i refresh okay now maybe so so as you see here no it's just keep doing that thing it was good in YouTube you guys don't watch youtube videos Mouse bottom here okay somebody's just keep Fi okay you just get out of this shall we try again no okay so still it's wise because the connection is slow is right which one okay yeah it's working now okay so the main thing I wanted to show you is this thing so there are these objects is a cocaine there's other object and so the robot is going wants a rest object and take it from there put it here so how do you do that and this is done being with the picture and take a picture you know like two here and then it's going to put it somewhere else so in order to be able to do that then the robot needs to know that what is the pose of this object so that it can plane as this different angles of this articulation so that they can pick it up and that's the way to do is that they have a 3d objects of these these 3d models of these objects so how do we play this again okay and so main thing is that given a 3d object given a picture now how do I determine how the what's the pose of object so that is the pose estimation it's very little camera calibration and we are going to talk about how to determine that once we know the pose then we can program the robot that it can go and grasp the object and take lift it up and put it somewhere else okay so I will the I'll send you the link and you can watch this at home so we will go ahead move forward and this I think somehow now it's working well so these are the virtual creation of seeing these are the objects orbit we have modeled and we can visualize this from many different viewpoints and so this is like a traffic's so we have a 3d model now this is a real object so we take a picture and based on picture as you see the camera then we know exactly how the object is situated and then we can pick it up robot can pick it up and put it somewhere else it's going to put victor cereal box similarly as we did before I'm going to pick up another object which is a box and like that okay so you should get out of there so so now you know these are pretty good applications of computer vision and the main thing which boils down to that how do we relate to different coordinate systems because ultimately we are dealing with the numbers so we have say coordinate system said this is the origin of a coordinate system where the thread can is at that at the bottom of that so that is 0 0 0 so we have 3d coordinates now suppose this is your x axis this is your y axis and the z axis so now I can define anything in this room with respect to that origin and I will have these units like inches our feet are centimeters a meter I can say where is sitting I can say this is axis y this is Z I can say where that projector is I can have the coordinates of any point here so this is the way we are going to relate the geometry of the scene okay so that's a coordinate system it's called world coordinate system and you can you can put it anywhere you want it but you know you have to be consistent about it's always here always there and so on so then we will have the image coordinate system when you take a picture you have number of rows and number of columns so you will have X say is equal to 30 and Y is equal 90 so that is image coordinates which are called small X small Y this is called XYZ upper case X approaches by a precocity so these coordinate systems have to be related to each other so that row bar they know exactly where that cocaine is what is this location so that it can claim say I'm here then how do I rotate my change my angles of this arm so that it can actually reach that XYZ and then how do I move it to put it somewhere else and similarly there's another object its location orientation different so then it will arrange his angles of the robot arm so let's pick it up and so on the same thing as you say that we can put an artificial object in the scene so we need to know where exactly to put in and since we have the 3d object and we have a 2d picture so we need to know that what should be the focal length and what should be the scale and all these things so this appears real but if you just take the object from one picture or the picture we pasted it won't look real because it won't have real perspective effect okay because the problem is that they have a 3d world we are taking 2d picture so how do we make it real and we need to in order to do all those things we need to talk about the basic transformations that how to relate one coordinate system with other coordinate system in this case the coordinate system on the world which is in 3d with the image coordinate system and the camera coordinate system which is a 2d ok so we are going to quickly review this do you already know some of these but just to for the sake of completeness I'm going to go through this again so you know listen these carefully and at the basic level each is very simple very trivial anybody can understand but we are going to build on those and come up with one camera model at the end which may look little complex but if you understand these simple operation then it's actually fairly straightforward okay so the simple tasks transformation is 3d translation so we have point x1 y1 z1 we translate suppose this object at centre is x1 y1 z1 we translate in 3d and the translation is DX and Y dy in Y and DZ in Z direction we get a new location x2 y2 z2 so we can write it down this transformation in a translation matrix which is a 4x4 matrix and these are the displacements ok and for short we will say this a translation T metric which is a four by four matrix okay so then inverse of this translation matrix will be basically exactly same thing but - DX minus dy minus TC so this is n verse of this and if you multiply these two matrices then you should do it at home you will get identity matrix identity matrix means on the diagonal you have once the rest of the elements are 0 okay so that's a very simple idea and this is done here so then the next thing is scaling which is that you take x1 y1 z1 you multiply with the scale SX and x sy + y + SC + Z and you get a new location x2 y2 z2 and for this also we can write down the scaling matrix now we will have on the diagonal these scaling elements SX sy SC again the four by four matrix and then we have the inverse of that which is just 1 upon SX 1 upon s Y 1 upon SC and if you multiply these two you will get identity matrix again and you should verify that and you also know rotation matrix so this is a rotation around z axis so we have a vector here we rotate around z axis so it becomes like here and we can relate XYZ with X my prime Z prime and we look at these angles 5 and theta and we have done this before and so we will end up with like this X Y Z F so X prime Y prime Z prime which is these angles cosine theta and sine theta this is a rotation around z axis in the counterclockwise direction and all these you should be able to know in a midterm I'm going to ask you to you know write down this No so it's pretty simple it's not that hard but make sure you understand this so now the other thing is the inverse of the rotation matrix and this is the inverse of rotation matrix z axis by angle theta so you will notice that actually the inverse of a rotation matrix very easy just a transpose so if you take this matrix which is shown here then you transport which means take the rows make them call him so this will become cos theta sine theta 0 which is here this this row will become this local 'm hence simulant this row will become this column so this is inverse of this but actually this is a transpose of this and you multiply these together you will get identity matrix because cos square theta plus sine square theta plus 0 is 1 and similarly if you multiply this with that cos theta sine theta minus plus sine theta cos theta plus they will cancel and they become 0 and so on okay so so therefore the inverse of the rotation matrix is a transpose which is shown here that I can take the rotation of z axis by angle theta and inverse I can just find the transpose of that and when you multiply the transpose with you know that cell that this without a transpose then become identity now other interesting thing is that these rotation matrices are a special kind of matrices which I'll call orthonormal matrices and autonomic matrices means that if you take a row and find a dot product of that column find a dot product of that then if the dark particle with itself suppose this one so it become Casa squatted a plus sign is cot theta plus zero it's one but if you find in dot product with other row like this one costed I minus sine theta plus sine theta plus cos theta plus zero become zero so that these are the matrices for which if I multiply I find the dot product of the are row I and row J if they are same then become one if they are different with the come 0 okay so and we are going to use this property you know in in camera calibration so that's fine and we also talked about the Euler angles you know so we can have these rotation on arbitrary axis and we'll break this in three rotations rotation around the x axis y axis and z axis and this will become like that and we can assume these rotation angles are small they'll become simpler Euler angle matrix which is three by three matrix which in which we have these angles alpha beta K map so now we also talk about the perspective projection which relates the 3d Pines in upward with a 2d points on image okay so that's the the pinhole camera model so we have here object XYZ and the server lens in this our image plane so we the lens the Ray comes here goes through the lens and makes an image here so we have from here to here the distance called focal length F from here to here is Z the depth and this origin is here and this is upper case Y and this is the small Y so looking at these two triangles which are equivalent then we can write down minus y upon Y is equal F upon Z and we can rearrange and now we get a Y with the image coordinates related to the world coordinate which are upper case Y C and X and this is a focal length and then this is for X so we have done this ability and now if the origin is here as compared to origin the lens will have a little different relationship this will again be minus y this is upper case Y and this is F but this distance is now Z minus F because from here to here Z but since we are looking at this one so that already here F so we have Z minus F which is this one so then Y would become like this and actually become like that so there are little difference depending on where you are assuming the origin okay so then the question is that you know how can we come up with a perspective matrix like see we have a matrix for translation scaling for patient we did that now and we could to come up with a matrix for the perspective transformation but it's nonlinear you know because it's a Z's in the denominator the other one was linear they were all in numerator so the trick is that so you know we want to come up with this matrix and which which is kind of defined the perspective transformation is that what we are going to do is that we are going to use what is called homogeneous coordinates so we have X Y Z we convert them and these are called Cartesian coordinates we convert them the homogeneous coordinates it's very simple we just multiply with this constant K every element here become K X K Y KZ and the fourth component K so from 3d will become 4d okay to simplify things so these are the Cartesian coordinate these are the current homogeneous coordinates okay so then we can convert from homogeneous coordinates to Cartesian coordinate again take the fourth element divided by the first one divided by the second one third one and so this is like inverse operation as we did here you're multiplying here we are going to divide so now if we do that then the perspective transformation actually can be written like this so we have 1 0 0 0 0 1 0 0 and so on like identity matrix and then we have on the fourth row 0 0 minus 1 upon F 1 ok so let's see you know how does it look so so we are going to this will work only with the homogeneous coordinates so we have a KX KY KZ and K and the server perspective transformation matrix so we get the homogeneous image coordinates which I can't CH 1 CH 2 CH 3 CH 4 so we are going to multiply this vector for the first row will become K X then les thousand 0 we get KX multiply this with second all become KY and the KC the fourth row will come 0 0 this become KZ minus 1 upon F plus K so that's what we have here now we got these the image coordinates in the homogeneous system so we want to come up with the Cartesian we are going to take the fourth element divide the first one by that divide the second one by that and the first one become x coordinate image coordinate and second will become Y mid corner and that's what we have CH 1 divided by ch4 so CH 1 is KX CH 4 is this one so this KX upon K minus Z upon F KK will come common cancel will get this and then CH 2 which is KY / ch4 this so this is exactly what we had previously for the perspective transformation so we are able to do that come in with a 4 by 4 matrix as we have done for the rotation translation scaling now we have four perspective transformation okay so now yes what is it what f is a focal length C then I'll show you here focal length of the camera which is the distance from the lens to the image location when you have when you take a picture you can change the focus so that's that's what it is okay so the so now we can come up with the camera bottle so we have whenever we take a picture of a camera we have to consider some world coordinates you know and and all these things will be littered with that as I said earlier so let's say you know camera was there you know at that origin of what coordinates and if it is there I mean we cannot see much you know so we are going to you know translate you take that and put it somewhere and XYZ transness we will call that translation the matrix G so whatever you do we have to account for because we cannot operate a we have to know so it was there now how much we translate it and that's what the robot does you know because robot is not human it has to really calculate so it was here now I'm going to go here and how much I need to XYZ so so first we are going to translate y this G matrix and then we are going to rotate for example rotate around z axis in the clockwise right counter clockwise direction so there's the second transformation and then we are going to rotate around x axis in the counter clock direction and then we are going to translate again by matrix C so this is a series of transformation and for every transformation we have a matrix okay so we started with the word point let's say word homogeneous and all this we are going to do in homogeneous coordinate system which is recording this is nothing but take the 3d coordinates aid the constant K and multiply each of the three elements by K at the fourth one that's it so we have the word point which is W endogenous coordinate system then we applied the translation then apply rotation on Z rotation on X rotation and then translation again then perspective transformation to take take a picture so this will give you the coordinates in the image the image coordinates because we are moving the camera and so on then we are taking a picture which is the this perspective summation then we can take exactly any point in 3d we know it's X Y Z we can find out what will be the image coordinates in the small X small Y if you know the focal length so it's very nice it's pretty simple but that's important model we need to know yes I mean we just assume that my camera my started I put a camera there and the connected trashcan so then I want to take a picture now I move the camera from the origin and translated then rotate it then rotate it and then translate a limb back a picture this is just one example I did but you can do some other way but whatever you do you have to account for that and that's what the robot is going to do no one needs to know that well okay I need to now translate my arm this way then I need to rotate and so on so these are the series of transformation we need to account for then we know exactly that point and 3d where it's going to be projected an image what pixel column and row there was another question yeah but I mean I just took an example I mean you can do maybe it happened decided to X you know so you can do anything it was just example okay so so now one thing you need to understand that we are moving the camera here okay I'm not moving the object so you know either you can translate or rotate the object where camera is fixed or you can rotate translate the camera and the object is fixed they always they have relative things it's alright so what we have talked about for example rotation that we are rotating the object in translating the object and so on so and the camera is fixed now here we are rotating translating in camera object is fixed so in order to model this then we are going to use inverse transformation okay because you know if I translate the object you know there's one translation now object is here then I want to get the same effect of the moving the camera then I have to move the camera in opposite direction in order to do that so that's the whole idea yes after that diagram that you had at the perspective inside yes so let's look at that what you were asking this one so this is this is general thing that I can take it doesn't have to be related to any you know perspective mail and so on so I have 3d coordinates I make a homogeneous Technic K multiplied with all three ed that become four dimensional coordinate and that's kind of ingenious transformation okay and but perspective and you understood that how we caught those things so that's you let these two together then we know we have this yes the W matrix which is here yeah so W is another matrix W is a point so we have 3d points express in homogeneous coordinates when we on the side find out what is the image coordinates yeah there's a void point one point object in the world coordinate system in 3d and we want to find out what will be the image coordinates in the camera if you do these transformations okay so that's what we have so therefore we can expand this and we will have the these matrices G we translate it you know this amount the camera x0 y0 z0 okay then we have the rotation around z axis which is this matrix then we have rotational x axis we have this matrix we have another translation with this one and then we have perspective transformation which we just explained so take all these matrices you know there are five of those we multiply them together in order is important because first we translated G then rotated z-axis to the x axis then translation then perspective so this is the order we are going to do and we multiply and you should do that at home and you're gonna get left something like that so you'll get here this image homogeneous coordinate system coordinates a four-dimensional from there you will get the X take the first homogeneous divided by ch4 then become X image coordinate and then Y Muskaan will become CH 2 divided by ch4 and this is what the expression will be so then walls the theta angle it involves of 5 angle the rotation angles involve X 0 y 0 it involves the R 1 R 2 R 3 and so on and the focal length so this is kind of you know camera model we have and that's what essentially the robot is going to do okay so there's nothing it's please everything is very simple but you have to do that you have to get these matrices multiply them do it correctly and then come up with from home quite homogeneous the cartesian we are divided by the fourth element and that's it ok so that is our camera model and now we can treat this say well we have these 5 matrices and we can have 6 matrix n matrices you know we can have lots of them each is 4 by 4 you take 5 matrices 4 by 4 multiplied together you'll get again 1 5 by 5 matrix okay I mean four by four metric which is this one so we can treat this there are lots of things we did but you know the end is the one matrix okay with the product of these matrices so then we can write down this one the CH one which is the image x-coordinate and the homogeneous will be this multiplied with the first row which is X F 1 1 by F 1 2 and so on and CH 2 which the y-coordinate will be like this and then CH 4 will be like that now in this thing as you know CH 3 is useless because we don't have image we don't have 3d coordinates we have to do corners so we are only interested in CH 1 CH 2 and CH 4 because we need to convert in the Cartesian so we need I'd buy ch4 no CH 1 1 / ch4 CH 2 yarn / CH for the first will become X second will become Y so that's what we have so therefore if we want to find the camera model we actually need to know these 12 elements and symmetric we don't need to know all 16 because CA 3 is actually useless for us okay so so that's the image coordinates defined as I explained to you so now the question is that how do we determine these kemon immortal these twelve elements in this matrix so one way is that you have to know exactly say well I put a camera at origin I don't translate it rotate it rotate they translate it and it's my focal length and I can put in those values and this one this I showed you and I can find out that camera model but the is there any other way to do it you know isn't better better to it so the way we are going to do it and that's in a way called camera calibration is we are going to do it that we will take some known 3d points in the scene you know we'll take maybe the camera you know we know what is 3d coordinates of that with respect to origin we'll take this chair you know or some other things so you take the 3d points non 3d points and then we'll take a picture and we can identify where is the camera at that point you know image coordinates for that we know the mechonis for chair and so on then we will try to find these camera elements 12 unknowns of a matrix okay by using least square fit so so typical example is that we will have a checkerboard like this so we have these alternate you know black and white squares and we can define any where coordinate system you want say well this is our my origin and I'm going to put in some height this is X this is y I'm going to put some at Z and then I can define the each of these in the world coordinate I'll have some sheet of paper I'll have this thing and I know this how many centimeters each square is so I'll have these in the 3d coordinates on each of these points and then I take a picture of that and picture may look like that okay so I know the 3d coordinates from here I know corresponding 2d coordinates then I can do the Lisa scarf it so the remember this was our model that we have the camera metrics here these are the world coordinates the 3d car and the image coordinates and we are going to use these equations to come up with these unknowns a11 to a14 and a2 one day 2 4 and a 4 1 2 a 4 4 and we don't care about a 334 because that's useless ok so now this is again we have CH 1 is given by this CH 2 is given by this and CH ones can buy that then we can try to rewrite this like that because now we have a 1x1 2y and on this 7 4 - CH 4x and CH 4 is given by this and we like to plop small X each element this will become minus a 4 1 small X uppercase X and minus F or 2 small X Y and minus F 4 3 z XZ and then minus a 4 4 X so this will be the first equation and similarly you will have second equation from here which is shown here we just put CH 4 here and bring in on the left side so now we have two equations which relates the 3d coordinates which are XYZ upper case and image coordinate which are small X small Y and the A's so when we do the calibration we will know the x5z upper case we will also know the image XY unknowns are the AIDS so here we got two equations we have 12 unknowns so if you just have one point we cannot solve this but we can take several points then we can solve this and that's what we are going to do one point will give it two equations and we have n points we'll get to any questions so to any question these are linear equations and we can do the Lisa Stanford again this a another example of the SU coffered we have been talking about that like lucas-kanade a there we have two unknowns we can take 3 by 3 neighborhood 90 questions we can do that five by five twenty five questions and two unknowns okay so now one other thing here so we can put this in a matrix form so these are our unknowns these are the knowns and this is our you know zero vector and now we can call this matrix C we can call this vector P and the zero vector like that big vector and it has to end rows suppose we take ten points it will have 20 euros but our unknowns are trap now one other thing is that this is a homogeneous system it's a linear system but so how much gives homogeneous means the right side is zero okay so which means this does not have a unique solution okay it has lots of solutions okay so because on the right side is zero so I can multiply anything on the left side and multiply right side become zero so CP is equal to zero but to CP alpha is equal to 0-3 CP is also equal to zero so it's a homogeneous system so what that means that I can arbitrarily select one of these unknowns and then solve for the remaining eleven unknowns and I did that by selecting air for four as a one then I will have instead of 12 unknowns I 11 or 11 okay so now if this is known I can bring in on other side so this will become the known will become one basically and this is multiplied by x1 so I'll become on the other side will become X 1 1 and X 2 and so on and then this matrix will become now the to end rows but 11 columns because I'm going to get rid of that and so now it become like that okay same system but I got it up a four four and a half like that so now we have we'll called cymatics e a D this is Q vector which I want to find out this is our vector which I know because these are the image coordinates and I can do this matrix is not a square you know it is the 2n by 11 it's not the squire and but I can force it to be square by pseudo inverse I multiplied by D transpose on both sides and D transpose D will become square I can bring in here so then I can find this Q like this and we have done this many many times so you need to be very good at that and this will be asked and the exam so you should be able to do that and repair questions do you know if you don't know you know ask me stop me but this is nothing it's very simple because equation left side right side you just you know transfer that's right yeah so see as you saw here so this will become this was multiplied by X 1 F 4 4 X 1 a 4 4 X 2 all these things so now this is 1 so it'll become just minus X 1 minus X 2 minus X 3 now we bring in our other side become plus and that's what we have yeah 1111 columns now before we had 12 columns see we have 12 columns if you count these now we have 11 columns no I should know no I mean you know why in Allah why no I mean no no yeah let's let's listen carefully see that what we have is this is our system ok in all these we are we know everything here we know the upper case X Y we know small you know everything the unknowns are these 12 unknowns okay so I'm saying let's select a 4 4 is 1 so now we have 11 unknowns ok so therefore if this is 1 then the system is saying that the first equation is X 1 a 1 1 plus y 1 a 1 2 and plus Z 1 a 1 3 and all this thing and this is now minus X 1 is equal to 0 that's a first equation yes no no I think you are confused and you think about what they listen to what I'm saying see this is pretty simple things alright this is the one I'm multiplying this with that yes let's look at this one here let's let's look at this one okay this is easier so this is my question here I have a 1 1 a 1 2 A 1 3 a 1 4 plus 4 then I don't have a 2 1 a 2 2 a 2 3 a 2 4 I don't have a 3 1 we don't have those 8 ones yes and then I have F a 4 1 a42 a43 a 4 4 so this is the first equation so in this one I have the eight unknowns but the whole system has the 12 are not okay because the other one why has the other four so this is my question now what I'm doing in this one I'm assuming a 4/4 one yes so which means that in this equation I can write down I can bring in X on the right side because it's known okay so this equation will become a 1 1 X 1 2 y and all this is equal to X okay so now in this one actually our seven unknowns instead of eight because I decided a 4/4 is one okay so therefore this equation I can write down like this as I have shown here that this is exactly same this x this is equal to x one is exactly the same as I have this equation here when fo four is one okay so like that I can easily write down this way because I have a little decided a 4/4 one so I can take the last column and bring it on the right side because know x1 snow next to is no next three stuff yes yeah the reason is as I said because see this is the this is the homogeneous system which means on the right side is zero is equal to zero so let's say somebody told me the solution is p1 it's a P one's a vector there's some values okay so now you tell me the solution of this system is p1 vector now I will tell you that two multiplied by p1 is also a solution because if I multiply 2 on this side but it will add 2 on the side see the same I will tell you 3 p 1 is also solution because of 0 on the that side so there are many many many solutions so there's no way I can have a unique solution which means that I can select any element arbitrarily I say a 4/4 one now I've 11 unknowns so that's the idea this is a simple linear algebra thing okay so if you still don't understand I can explain to you later okay so so but this is pretty pretty straightforward so once we know that so then we can find the camera parameters the 11 unknowns and we are in a way ten with the calibration of a camera by finding the 3d points in the world putting that chest check about petram which we know the 3d point taking a picture then we know X Y Z and X Y small X small Y lots of these and then we do the Lisa scarf it you know then we will find out these unknowns see that's what I will explain to you so you are not listening see that this D is not a square matrix so you cannot invert that because see so just this listen carefully this is what I've been saying this is Scott not AI square matrix because see it has the eleven twelve columns and it has you know many many rows so it's not a square matrix so so we that's what the whole least square fit idea is that we have more constraints here we'll have more constraint it's called our constraint system see if you have say two equations two unknowns it is called constraint system and you can solve linear system if you have two equation I can lucas-kanade a we have two unknowns and we look at the three by three never be a 90 question so we have all constraint systems we do least square fit so we have rectangular matrix we divide by transpose and now this D transpose D will become a square matter okay so that's the idea okay so now see once we can find these these camera parameters that is then we can do lots of interesting things one thing we can do is we can find out the location of a camera by looking at the picture okay so let's say we have a 3d point here and it's image is formed here and as you know they made solids formed from 3d points we draw a line to the via the lenses this immense so that's the perspective projection okay so then there's another 3d point and we draw a line from there to the lens and image of this will be formed here so now we want to find out from the picture and given the camera matrix we want to find out the location where the camera the picture was taken the L so what we can do is essentially if we can find the intersection of these rays that is the basically location of the camera okay so that's a process you know there was a paper actually they did that so we'll take that one 3d point is x1 and we'll find out image coordinates in the homogeneous system which is this one and then the third component no witnesses no not really useful we will make it 0 and we will find out the corresponding 3d point of that image point for which we have made the 3rd as 0 and maybe that 3d point will be this 1 X 1 1 because this is the image this is an image of these points and that they have to lie under on the Ray so therefore if x1 was image here then if we make the third element 0 then that these world point correspondent that will lie somewhere here suppose here then we will take another point 3d point which is x2 its images here again we play this trick and make the third element zero and find its corresponding 3d point which is x2 1 and then we intersect these two and we find a location of a chemist it's very simple okay so that's what we have now in the mathematically what we have is this is our camera matrix and we talked about how to find that once we have that then we can relate the 3d coordinates with the 2d coordinates in the homogeneous system and the first simplification will say u1 is equal to a matrix multiplied by x1 as I showed you in the in the picture and then this will have the homogeneous coordinates in the image CH 1 CH 2 CH 3 CH 4 and we'll just make that ch3 0 and so this will become another 3d point which we'll call u 1 Prime and we will multiply with inverse of this M matrix and which which we'll call X 1 1 ok so now we have two 3d points one of the joo-won which we selected I mean X 1 which we selected and we have another 3d point which is X 1 1 by reproach acting the image point back to the 3d by finding by setting the third element as 0 so we get X 1 1 similarly we take another 3d point X 2 we get this image point 2 2 and make the third element 0 and find the inverse of this multiplied with that we get the end of that 3d point X 2 to correspond with X 2 as I showed you the picture here yeah so this X 1 and this is X 1 1 is X 2 X 1 1 we can draw a line here and draw a line the intersect and that's a location of camera that's it it's pretty simple so now we can do another thing that we can find the orientation of a camera this was a location where it's located we can find out where how it was oriented when we take picture so the the idea is very simple as you see here say objectives here the lens is here the picture of this object is formed here now let's say we move that object closer to the lens say it becomes here then again image is formed here difference between these two is they've since the object move closer to the camera the image will be bigger will be in large scale let's say move farther then it will become even bigger okay so we keep moving the object closer to the camera closer camera now in the extreme case the when the object is at the lens itself okay then there will be the image form at the Infinity because we we have object here we want to draw a line which goes through the lens and also hit that plane it's going to hit at the Infinity that's as you see it's moving moving moving like that so that is the point we are going to use to find the orientation of camera which is pretty interesting so so we have the system we know that a so we have the x5 z world coordinates and multiply a and XYZ we get the homogeneous coordinates and image and now then will an image coordinates are defined like this X is ch 1 divided by CH 4 y is ch 2 divided by CH 4 let's see Cartesian coordinates then we'll the X Y will become infinity using these two equations ch4 able to zero then become infinity so that's what we are going to do so you'd say in this system ch4 is given by this which is the a 4 1 x f4 2 y F 4 3 Z plus F 4 4 0 so that actually is a question of plane X Y Z so this F of 1 F 4 to a 4 3 that give you a surface normal of the plane where the camera was oriented very easy but it's a good good reasoning ok so let me give you an example so actually somebody did that in San Francisco so they did all this and find the camera matrix which is shown here for the San Francisco picture which is shown here this is a picture and using this camera matrix then they found out that the camera was located intersection of California and messenger streets how many of you have gone to San Francisco so there's a mess industry there is you know California Street so it's actually there it was 430 feet above the sea level and Plus oriented at the angle 8 degrees about the highs and and these were the you know focal length and X and 1 so it is locations yeah this a map this is the image this is another example there's a camera matrix and the picture for this one they found out it was located 1,200 feet above the sea level and this was the orientation of the camera for digs about the highs and there's a focal length okay so this is nicely described in a very nice paper by Tom straat which I'm putting the reference here so so that is the first part and do you have any questions on this part before I go to the next one yeah yeah so orientation is this way that we are using the fact that how the image is formed you know so we have object which is shown here and it's images from here these rays has to go through the lens okay so now say object was at this distance here and this is this image this is a length of T this sum we move the object closer to the lens now object becomes bigger okay we move further even it's become bigger and it's obvious that's the way it happens when you take a picture when you move closer pilzer know I'm take a little bigger bigger yes in the picture so now what is the extreme case that let's say object is exactly at the lens no so then the image will become an infinity because it's straight line we are going it's never going to hit that camera okay now then we are saying that while how do we find the image coordinates this is what we have been talking about exomesh coordinates given by the CH 1 divided by c h4 y coordinate curl by CH 2 divided by CH 4 that's so then we on say well then it's infinity X will become infinity while become infinity and then it will become infinity it will become only infinity when CH 4 is 0 yes and let's say then what is CH 4 equal to which means we multiply this with a fourth row of this matrix and that was the F or 1 X plus F or 2y plus F for 3z plus f4 for okay so now when you look at it this is equation of a plane and this is the normal of the plane f4 1 f4 to a 4 3 or 4 so actually you don't have to do anything you just look at a fourth row and that give you the orientation the normal to the plane we had the camera pocket which is very interesting okay any other question go ahead you know Kusum know who else was it was yes why which is the focal length you know yeah focal length will change the see the camera has these extension parameters which is the translation notation lyrical access permit and we will talk more about that another model an intrinsic parameter the most important parameter is the focal length F so that's what you know we when we say that we calibrate the camera we need to know two parameters to set up parameters extrinsic and intrinsic intrinsic sar focal length and other thing is that what is the scale that we convert the pixels to the inches are the feet you know 3d is in inches and feet but they map it to the pixels so we talk about that you know those vectors and and that's it that's your full camera model yes yeah so you can see that given the you know surface normal which define the orientation of the a plane then you can firm that you can find out the angles with respect to XY and Z suppose you know if I have 3d here X Y Z so I have normal you know just like this so that's one way to represent the orientation know now once yaga normal then you can find the angles with respect to X which is to provide respect to Z so it's you know simple math and there are ways that see if you go to my book the fundamentals computer vision the one may define the rotation matrices is called what is called Rodrigues formula so there you don't have a matrix you actually have a back you know and so you can you know go define in notational x-axis y-axis like an either angle we did or you can say that we are rotating around arbitrary axis so these are just different ways to you know define it's the same thing you know if I if I know the plane its orientation then I can find out what is the angles which are which will align this with respect to what coordinate system that's what coordinate system also has orientation no it will it can be just like that what can be like that for then I have another 3d coordinate system then they are how they are related okay any other question so just to summarize what we just finished is camera parameters and we have extensive parameters these are the parameters which relates the location orientation of camera and then that is the translation 3d translation vector and then the 3d 3x3 rotation matrix intrinsic parameters which are the necessary to link the pixel coordinate with the image point and the they include the perspective projection the focal length and also the transformation between camera frame coordinates and the pixel coordinates so we will revisit this camera model one more time and talk about very similar but little different model in order to be able to drive these exact parameters in terms of rotation translation with focal lengths and so on because we just what we discussed earlier there we were doing the camera calibration finding this a matrix these twelve our eleven unknowns and then we are able to find these the mapping from 3d to the 2d but we didn't have directly the the rotation and translation parameters all these were kind of collapse and this a coefficient a matrix so here specifically we are going to actually be able to come up with the rotational matrix translation other parameters so this is the simple model we start with so we have the word coordinates which are XYZ in the Manu system this is the rotation matrix in sensation vector so we take the world coordinate P in the homogeneous and then multiply with rotation matrix and translation matrix and we get this piece see then this matrix which is product of translation order matrices we call them M matrix M extrinsic and then we are going to use the the perspective projection again and here again we have the x5 z what point this is a lens here which is shown here and the ray of light will hit the object go through that and the image is formed here now what we are saying here the image plane is actually in front of the lens normally image blends behind the lens just to make a simple equation so in this case also in the distance from the lens to the image planes f with the focal length and distance from here to the point is in Z Direction Z and this is y and this is the image of this in the mid plane it's just from 2d to 1d and this is small Y which is a projection of the Capital y so in this model actually again we can look at these equivalent triangles the one big one and one small one here where the origin at the lens then we can drive this relationship the small Y the image coordinate of the point divided by the uppercase Y the world coordinate of the object is equal to focal length and divide by C so Y is equal this and similarly we can come up with X's you can like that so this perspective model we can actually write the perspective matrix like that so we have a 3 by 3 matrix and we have X 5 Z if you multiply this with the first roll become FX plus F plus 0 multiplied by y 0 this will be 0 so I'll become FX and then if you multiply with the second row will become 0 and F Y and then the third row is 0 0 and C so since the homogeneous so we have FX upon Z and F upon Z for the camera coordinates the image coordinates here as we have been talking about perspective matrix so now we are going to relate these image coordinates which we'll call X I am and why I am camera coordinates which is just X Y and the image Center in pixel is o X or Y and then we have the effective size of a pixel which is SX sy in terms of millimeters are centimeters in the horizontal and vertical directions so using this we can write down the X Y which are the camera coordinates in terms of image coordinates the image center of X or Y and these scaling factors okay and this we can rewrite like this X I am from here will be is equal minus X upon X X plus oxo origin and similarly Y M so that give you the transformation from the camera coordinates to actually image coordinates in terms of these origin translation in these scaling factors so now we have this model the world homogeneous rotation translation then the perspective now we away enough imagine then last transformation the camera so this is what we have well you know this one the extrinsic nets is the perspective transformation and this is the camera in the image quadrant system transformation so we have these three matrices and starting with the world oh ma genius give you the camera homogeneous like that so we can multiply these two matrices these two matrices and then F will bring in here and then the set rotation translation this way and then we will call this we have been calling this exercise metric matrix M X and this is the M intrinsic metrics so two matrices which relates the world to the camera homogeneous and we can multiply these two matrices get a 1 1 matrix called AB okay so now we can multiply these two matrices M in dynamics and we'll get this huge matrix again this is the 3 by 4 matrix three rows and four columns and as you see if you multiply this column with the first row comes minus F upon s X or 1 1 plus 0 + + r3 o X and that'll be the first term here and you multiply this you will get the second term and so on you should be able to verify that so that is what we caught for using the systematic model of the translation rotation and then perspective transform and relating the camera in the image coordinates and in this we have these parameters the rotation matrix the nine unknowns in the translation vector and the focal length and all those things so that's what we have from previous one now we can simplify this further because we have F and then up you have a scaling factor SX we'll just call this F X and this will call the F Y and we have now we don't have anything in denominator we have these kind of metrics so we now remember that we did the camera calibration estimated if you know the 3d points and we know the corresponding 2d points then you just discussed that how we can estimate a camera calibration in terms of a matrix where we have eleven unknown okay and so from that camera matrix now we want to determine the exit and entrance parameter using this model we just discuss the translation rotation in the FX FY and o X or Y so you didn't talk about that so now so in a way we have two camera models one is the this model M and which was the mm-hmm model we use and previously which we are calling a matrix which is multiply these matrices together we get that now here this model we just try where we have a specific meaning of each other Trump but this contains the rotation terms the focal length terms and the the scaling terms and all those things and the translation here so these we want to relate these two matrices okay and we want to compare knowing the M matrix which we caught from the calibration knowing the 3d points and cusp on 2d points from there we want to determine the parameters on the right side in terms of the excess power it has an interest parameter which are the rotation matrix translation FX and FY and o X and O Y so one thing is that we remember that when we estimated the camera matrix we had the homogeneous system so M was not the unique solution but it was up to scale factor so therefore we can have M hat and we can have some a scale factor x where m is still as a solution and that always the case in the monitor system so therefore when we compare these two matrices this is a matrix finding 3d known points and their corresponding points this matrix and this is what we just computed analytically so we have this scaling factor gamer and we are going to now look at that how we can eliminate that how we can find the values and the effect we are going to use here is that we are going to use this as a rotation row of a rotation matrix and rotation matrix is the auto number which means we can take a row or column of rotation matrix there's a third row of rotation matrix find it dot product of that then it has to be one if we take the rotation matrix row and find that part of it another row then it has to be 0 that's the property of orthonormal matrices so therefore this is the dot product of third row and we get a magnitude and this is the third row correspondingly this M matrix so these have to be equal according to this comparison in the game ax now so since this is one now we are getting only magnitude of camera and then what we can do that we can divide each element of M hat by the magnitude so that we can get rid of this factor magnitude we still have to deal with a sign so the way we are going to do it is first we will estimate the translation depth Z by comparing term by term these two matrices because we assume we know that we have non 3d points we have known 2d points and we went through the Lisa scarf it earlier part of the lecture we know how these terms from here given these we want to find out all these so we will find out that easy by comparing this with this and then also these elements a flotation matrix third row by comparing these with that once we know that then we are trying to find out the OE x because in this row if the this element is known so we are come somehow going to cancel this and find out o X and that way we can also find Oh Y so these are the two steps then once we know x or y then we try to find out the r11 in R 2 1 and R 2 2 and so on because in this one then or actually known our tree is known have known then we can find out those and then finally we are going to compute a TX NTU I so that's the process we are going to follow so let's look at this now step by step and so this is what we have now first thing we are going to do since these two matrices are equal which means we can take this DZ which is a fourth component of third row and should be equal to the fourth component and third row in this side with the difference that we already have taken care of magnitude of that but the sign we haven't taken care so this can be positive or negative so that's why we have this Sigma here so now we can use a fact that if DZ is positive which means the origin of the word references in front R DZ is negative then in that case the origin of water difference is in the back using this we can determine the sign you know whichever way the condition we are using so we will basically know the sign of that so once we know sign then we can find just equating TZ with m34 we can find easy with your translation in depth and this way we can find these are 3 1 or 3 2 or 3 3 because again this is equal to this multiplied by Sigma this is equal to this multiplied by Sigma and so on so that way now we are done with the last row of this matrix which is gives us the third row of our original matrix in the translation and so we are making progress so now we have the same these two matrices on left and right side we know the left side we want to find out elements and right side so what we are going to do we are going to look at these three rows we'll call this Q 1 the first three elements Q 2 and Q 3 which is coming here and we are going to many plate those to find the rest of the elements okay so let's say if you take the Q 1 with Q 3 know these two and find the dot product so they become M 1 1 M 2 1 m31 I'm a Q 1 Q 3 I'm sorry Q 1 Q 3 M 1 1 but black M 3 1 and M 1 2 1 2 pi M 3 2 and M 130 x 3 3 we get like this and on the right side we have to do the same thing so I multiply these elements with these and remember that we know these so that's why we are using this so this is what we are going to do because this is the first element which is minus FX R 1 1 or 3 1 & o X and this is second element third element and this we want to multiply this or 3 1 not to do it like that so dot product component by component this will multiply 4 3 1 and we are going to actually do it a little differently so they're going to break this tram this one this one this one in this then plus then the rest terms so we have here the or 3 1 or X and second term here are 3 - Oh excellent so I'm from here and the first terms here and both of them this x bar 3 1 or 3 2 and so on and this also multiplied by that then we do the actual dot product and now one thing is that you have to see that this is the third row of rotational matrix and this contains the first row of rotation matrix R 1 1 R 1 2 R 1 3 because this FX is common so we multiply a third row with the first row because of orthonormal rotation matrix this will become 0 ok and second term since multiply but itself third over this itself so that will become 1 and we will have this multiplication as shown here so R 3 1 square or X and R 3 2 square and so on and we take deck or X common and this is 1 so therefore we have X now which is equal to dot paragraphs q1 and q3 so now we know the X so similarly if we to the q1 and q2 which is this one and this one then we can find oh why because the cells contain a wide sort of a X so that's good so then we know now the third row we know the oxn or Y and this and now we want to find out FX and FY because in this one you know that's what is remaining so for that we are going to do q1 with q1 dot product of these two rows and this multiplied by this Plus this multiplied of this I'm sorry this multiplied with itself multiplied with itself cell and like that so we do the same thing here and this is the first row from this and this is the first row from that so doing that we will get here as you see that we have the minus FX are 1 whole square then this one all's Quiet this one on the sky then we can expand this when we are going to expand this this thing so this will become FX square R 1 square + r3 1 square X square and we will have minus 2 FX or X R 1 1 or 3 1 and will Squire this one also like that this will also like that now those minus 2 FX are 1 1 or 3 1 and here minus 2 FX are 1 2 or 3 2 and here minus 2 FX are 1 3 R 3 3 so that will essentially will be the product of the Row 1 and Row 3 because of eternal matrices they'll become 0 so that's why we have not included those terms here so then we can simplify this also further because in here we can take FX square common so the R 1 1 square R 1 2 square and R 1 3 squared that's 1 so that we can FX square from here we can take oxo square common then will come or 3 1 square or 3 2 square or 3 2 square and then become one so that's sketchy or X square so therefore Q 1 dot part of Q 1 is f X square plus X square and we already know the Oh X so therefore we can find FX once we know the FX we can actually find similarly the FY y during the Q 2 with Q 2 which is this one so now we know the Oh X we know the FX we know this thing you know the title so we are very close to knowing everything so the last thing we are gonna do we want to determine the first row of rotation matrix R 1 1 R 1 2 R 3 R 1 3 and similarly the second row so if we look at the the this equation here we have or X 3 1 plus F X R 1 1 so what we are doing we are taking the this element and subtracting from the this one so we have the Oh X 3 1 and then we have here the if you look at this one big families here because we multiply oxm 3 1 minus em 1 1 on this side so we are going to do same thing or x x 3 1 and minus minus will become plus FX r 1 1 and minus R 3 1 o X so multiply the X and subtract from here on this side and the right side we have from here so in this one then we know quite a few things we only don't know r11 okay because we know the O X and of course all the M elements we know so we will bring in R 1 1 left side and rest are the element on the right side and the second assign which which we can determine plus or minus and so this way knowing this knowing this and all this we can find out 1 and that way we can find r12 and r-13 just looking at the other element this we will look at the first one the second one to find second element and third one and we can also do with the for the second row for the art too so here we are going to do m3 with m2 as it's shown here different element and in this one we have all Y in sort of Oh X so we will multiply say oh why with m3 any of the element and then subtract with m2 so here as you see if we multiply with oh why they look out oh why are 3-1 - this has become Plus this and - that and similar relation goes here we can find out the R - 1 R 2 2 and so on and now finally we have to find the TX and the T Y from this because as we have done in this comparison to find the the r11 r12 and so on we can do the same thing if you take M 3 4 and multiply with Oh X minus M 2 m 1 4 this will become our case here so we have all X and multiply the fourth element here and then subtract the from here so that will give you the DX and similarly we can find out the T Y by multiplying the Oh Y with m34 and then subtracting from em to 4 which is this one because we already know that TC so that'll give you the dy so with that we actually determine all these unknowns so the reading material is that the chapter one from my book talks about detail about this geometric model and also this is a good reference for the camera model which I have covered I followed closely and then the paper which I mentioned I give you an example from San Francisco area which is available in this paper ok that's it

Info

Channel: UCF CRCV

Views: 38,554

Rating: 4.8837209 out of 5

Keywords: Camera, Model

Id: NWOL8yXL6xI

Channel Id: undefined

Length: 92min 13sec (5533 seconds)

Published: Mon Oct 29 2012