Photogrammetry I - 15a - Camera Extrinsics and Intrinsics (2015)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everyone today last week we introduced homogeneous coordinates and in this lecture today we will actually use them more extensively than we have done that before because what we are a man doing today is describing the process of how a point from the 3d rolls it's actually mapped into my image and this involves things like where the camera is in the world where the objects are in the world but this also involves kind of the internal parameters of the camera these are things which are called extrinsic and intrinsic so extrinsic are a set of camera parameters which describe where the camera in the world where it's looking to and the intrinsic star the kind of the internal parameters of the camera so things like the camera constant or other elements of that camera which have an influence on how a point from the 3d world is actually mapped to the image plane and the goal for the lecture of today is that you understood the process of how a point from the 3d world ends up being in a certain location in my image what we can what kind of different cameras do we have what different assumptions are involved in terms of distortions maybe even getting ideally what to be the distortions look like and how we can actually model them and also get an idea of what we can tell about the object of the scene given we have a point map to certainly fix the location so the overall process starts we have the 3d world we have a camera which takes an image of the 3d world and that's kind of the output image that we get so what we want to do is or one of the motivations for the work is want to estimate the geometry of the scene in order to estimate the geometry of the scene we need to understand how points from the 3d balls are actually met to my 2d image plane and there are several thing or coordinate systems which are involved in this process and we have to now go through the four different coordinate systems that we will actually use over we map between in the lecture today so the first thing is we have an external coordinate system or the world coordinate system or object coordinate system there's kind of an externally given coordinate system there's the first thing that we have or be according on this team the second thing that we have we have a camera coordinate system and this means how the camera sees the ball so everything is described through relative to the location of the camera from pinhole model relative to the pinhole the pinhole is the origin of the camera coordinate system so the mapping between the world coordinate system and the camera coordinate system is typically a rigid Body Transformation versus just a shift in three dimensions and rotation then the next thing which happens is actually from the cameras point of view we map the 3d world to the image plane and this is the image coordinate system or image plane coordinate system and then finally from the image plane that's mapped to the seats of the sensor of my digital camera or to the film in the analog days but we're looking here into digital cameras funnel all cameras make sure if one system wore for digital cameras we met that finally to the sensor frame where we have the individual pixels we say then this pixel corresponds to whatever a certain direction with 3d walls we know how I have to deal with these different coordinate systems and if you look to the different quantities which are involved so we have the the object coordinate system this is a 3d coordinate system so a point in the Euclidean 3d world is represent by X Y is that these are all capitalized because according to the notation that we use this is something in 3d then we have the camera coordinate system and they all have this superscript K over here okay from German camera so this still still living in the 3d world but the K indicates that we in the camera system I could've actually have made an all over here maybe more consistent but then would have chose all the place in there for whenever there's nothing written will refer to the world object coordinate system whenever we on the image coordinate system that we go from 3d to 2d that's where the central projection takes place just as we are living in a 2d world on the plane therefore they're lowercase letters and we have the C here indicating that coordinate system and then we have the sensor coordinate system we again into the on the sensor but the sensor is to be looked perfectly in line with the sensor plane or at least there's definitely an offset between them so we have to map between those points again if there's no index given this means we mb in the object coordinate system ok so what we're doing now we start in the 3d world map it down until we end up at the sensor and how we can actually describe that isn't that way so we start over here we have an X Y in that location of a point in the 3d world one for doing everything in homogeneous coordinates then we have the mapping from the object system to the camera system because this basically describes where's the camera in the 3d world then we have the second transformation over here which says ok given the pinhole is the original and actually I'm looking into the into the direction orthogonal to the image plane this transformation Maps me maps the the point on to the image plane and then we met from the image coordinate system to the sensor coordinate system and then we end up getting a 2d point into the third dimension for the homogeneous coordinates again so we go from 3d to 2d over here and in here we actually go down from 3d to 2d due to the central projection so there are different steps that we are taking into account now I want to go through all those steps and it's important to understand those indices so sensor the camera plane the camera itself and the object system so we have to deal with those different coordinate systems over here and net between them ok just one further illustration to hopefully make that more clear XY that a walled coordinate system or optic coordinate system then we have the camera origin over here with X Y and that then we can map it to the image plane if the image plane is this one over here with the principal point over here and the x and y and then we have the sensor system because the sensor is tentatively shifted with respect to the principal point zero zero and the sensor system is not the center of the sensor so the first pixel for example over here okay so there is a relation between those coordinate systems the first thing is this transformation between those two beam arbitrary transformation because I can take my camera for every point in the world can take an image between the camera system over here and the image system so given that we assumed this is the line perpendicular to the image plane and we have the x coordinate is aligned with my image plane so this x coordinate lines of this one and this y coordinate is aligned with this y coordinate and the distance here is exactly seen my camera constant so this point over here the origin of my image plane coordinate system is very strongly linked to this coordinate system so the direction of the x axis is the same the direction of the y axis is the same and they are all at 0 0 the only thing is that it's kind of the Z coordinate in that system is see where this is the center point so clearly this all those two guys point in the same direction it's mainly written over there those two guys point to the same direction so this point here expressed in this coordinate frame is actually at 0 0 for x and y the only thing is in the Z direction I'm moving distance of C so the coordinate system of C of my image plane expressed in the camera frame is 0 0 minus C alright so there's a fixed transformation between them so they are strongly linked and of course we also have a fixed link between the sensor so the image plane and where the sensor is actually mounted so this is where the image coordinate system is another sensor sets because the sensor is installed in my camera so this is also some of the parameters which typically do not change versus the chip is basically glued to the back of my camera until unless I change the lens or do a serious setup or a change in my camera this would actually stay the same actually the location of the sensor in the camera should not change you may change my lens okay so this is completely free and can change from image to image the other things hopefully don't change from image to image okay I'm sorry we can also express it in this way here from the object coordinate frame we can map to the camera coordinate frame and this is something which works in both directions of course it's the rigid body transformation so the first thing is from the object to the camera coordinate system and this happens all in 3d then the next thing is when I move from the camera coordinate system to the image plane this is actually the ideal projection or the central projection which maps from 3d to 2d from the 3d world expressed in the reference frame of the camera onto my image plane then we met from the image plane to the sensor everything happens in 2d and then there's something which I haven't talked about we still stay in the sensor frame but there's an additional transformation that we actually do over here and this is for accounting for other nonlinear errors so everything happens here kind of is a linear transformation and then they are kind of things like errors in my lens like barrel distortion the prominent examples that I want to handle with an additional transformation which is a nonlinear transformation which describes additional errors but kind of the standard process if we have kind of the perfect lens and only linear errors we can actually stop over here and all of things only go up to this point but if I work with real cameras actually have to take this this elements into account and the first part over here so this part over here are the so-called extremes extrinsic so it's kind of the external configuration of the configuration of the camera with respect to my world coordinate system and here we have the intrinsic parameters or intrinsic s-- which are internal or special to that camera and which hopefully don't change or don't change rapidly don't change from image to image of course if I change parameters in my camera or change lens would of course get a different camera constant make it a different and these are the intrinsic s-- these are the extrinsic so this is the post and post refers to position and heading off the camera the extrinsic and the intrinsic SAR kind of the internal cameras which tells me how the object in front of the camera is actually meant to the sensor yes please you yes so you will train your focal length and you will basically change the camera constant in the way as if you have a zoom lens for example you basically so it's not very easy to actually express this because you don't know exactly where the center is if you have a multiple lenses but what you basically change you to change your camera constant and so this this is a change this part over here so of course if you have a zoom camera you can this may change from pixel to pixel but hopefully the other parameters how your chip is glued inside your camera should not change so I'm not changing any discritization effect in this transformation so what we actually doing here so even in the sensor system we do not discretize so we even have the sub pixel estimation but of course the intensity values that I get there are Indian discretized want also quantized as we have discussed this in the beginning of the course but in this kind of mapping where point from the 3d world is met onto the sensor there's something which is not discretized in here okay so any further questions about the overall setup what we are going to do today so we start from here move down until we are here to actually turn around and try to go the way back to get the mapping from one side to the other side this will take much longer the other one will then be rather quick of course there's a loss of information on that path from the left hand side to the right hand side so we won't be able to recover all the information okay any questions after that point yes please yeah that's exactly what I said so how can I go back so if I start from here from 3d I moved to the 2d world there's a loss of information some of us errors I can actually the error works in the other side the curse the transformation is invertible so I can go back but definitely in step from here to here I can't recover all the information but I still can say something about the object in the world and we will see what we can actually how we can actually invert this process and that we definitely won't find a unique solution but we can actually see the the solution space and can actually nicely relate them for example to the camera parameters that we have any further questions so far so it's not the case we start with this block instead we start from here going down to the other side so we start with the extrinsic saw I saw int oh well that's actually a pretty easy task to be done so what the extrinsic describe is where is the camera in the world in terms of pose and heading and this is a rigid body transformation and that is invertible so if I know where the world coordinate system is I know where the cameras I can estimate in those ways so whereas the point how it should be seen from the camera and if I know where the cameras at see a point I can express where it's in the real world so how many parameters are needed to actually describe this transformation for the extrinsic then yes exactly I need six parameters we need three for the post XY that and three for the orientation for the heading where the camera is actually pointing to okay so that's not very differing which should be a non concept so let's start and say okay we have a point in the real world point P let's see where this point ends up with respect to the camera system so we have our point XP X Y and that coordinate so the P just refers as my of my point P and I know the location of my camera in the world coordinate system so where the pinhole and that's which XY that location is the pinhole in the world reference system and this is XO y out there so three degrees of freedom so I know whether it's simply the location of the camera in the world and I also have low I said we need to know feel the heading where the camera is pointing to and this is something which we typically encode with the rotation matrix so which maps from the object coordinate system to my camera coordinate system and then we can actually describe the transformation that a point X P in the world coordinate system is mapped to a point XP in the camera coordinate system retina so relatively seen from the camera giving this transformation so we just subtract the origin of our camera and then apply the rotation matrix but the point is mapped from the object coordinate system into the camera coordinate system so everyone happy with this equation over here if this okay no okay yes that's not X X 0 X L is the origin of the camera so it's the location of the pinhole expressed in the world coordinate system so if my volt current is this is the centre of the world coordinate system and I'm here with my camera on the cameras pointing to you that would be whatever X 80 centimeters Y again 80 centimeters and upwards 60 centimeters so there's actually a bit the focal point this is this guy over here what we want to describe is how this point which is actually which is which are you the object point over there is mapped from the world coordinate system that was respect to this point how it is seen from the camera so from this point over here and what I also need to loss actually we are the cameras pointing to so I'm appointed to the ceiling the ground or towards you this leads to a different coordinate expressed with respect to my my origin and this is expressed by the rotation matrix so if I want to map you from the world coordinate system to the camera coordinate system I take your coordinates in the world coordinate system substract this location is literally the location of the camera so this is this the difference here and then the appropriate orientation and then I obtain your coordinate expressed with respect to this point over here any further questions ok so this was done any cleaning coordinates let's do that in homogeneous coordinates okay so so this is a point this was my point in Euclidian coordinates so I expended the vector then this can be expressed as a rotation matrix no translation no parameters over here which would lead to the projection a projective transformation then have a second transformation which is just a shift which is no rotation so a 3 by 3 identity matrix 0 over here 1 over here and here then minus the location of the origin of my camera and here I have my point in the world coordinate system expended by one I want can multiply those two matrices and get this transformation over here with this transformation over here I can actually express this very compactly as we have written that in the beginning to the point in the world coordinate system P through this transformation H from the object frame to the camera system giving it upon the camera system and then this transformation matrix is exactly this matrix over here so this is expressed in your feeding coordinates this is expressed in homogeneous coordinates so this one is slightly slanted to the right this the font this one is not slanted so for the capitalized characters it's harder to see it for the lowercase characters but ever sing which is non slanted or non italic is clean homogeneous coordinates and everything which is slightly slanted like this character over here is in the fleeting world yes please so so matrix multiplications are not commutative so I cannot change the order but if I first multiply this those two matrices or if I multiply those two points this doesn't matter what I've done in here so this is not a matrix this is the point this is my input point to the point in the world coordinate system and this is my translation and this is my rotation because what I do is if I have a point I first so what this was was the offset of the camera and the world coordinate system so I subtract that because then I do the rotation with respect to my to get to this point over here so this is the order in which I have that so if okay I can either multiply this point first with this matrix and then the out the vector which is the result of this matrix or I can multiply those two matrices first which gives me this matrix and then multiplied with this point this doesn't matter yes yes yeah I'm here to here okay any further questions at this point okay but that's it we also done so we've specified the transformation from the world coordinate system to the camera coordinate system it's a rigid body transformation six degrees of freedom and it's clearly invertible so that part is done let's look to the intrinsics or the inner voice here so this is all the remaining part now we have our point expressed in the reference and the coordinate system of my camera and now we need to map it first to the image plane and then we need to map it to the sensor okay so we again here we start over here to this transformation to the sensor and then account for the nonlinear errors again some of those transformations are invertible so I can map if I'm on the image plane already I can map to the sensor forth and back also here I can net force impacts given the knowledge of my nonlinear function of course but there's a lot of information here this is due to the central projection so I'm mapping from the three-year-old to the 2d world okay so how do that verts don't want to go into the details but we typically have our image plane where the sensor is mounted in this image plane there's some point here which is the principal point and there so this is a 90-degree angle over here and this is the direction of the camera where the camera is looking to and then I have a distance between the origin of the camera to this point k1 and k2 which is to be assumed to be the same point for a thin lens then the image plane so the distance from the origin to the principal point is exactly my camera constant see in the perfect case this is actually the optical axis and the direction of the camera is pointing to is actually the same but here it's drawn as a small deviation and all the other points all the other elements may currently not that interested in but if we have a point P from the 3d wall we just met onto my image plane first and then then there's a correction for that in the end based on the location of the sensor and based on the error set my system may have something nonlinear at lens errors which I do is last transformation over here okay let's go into the details so we split up the overall process into the projection on the image plane without considering any errors that we have in there so we use an idiot perspective projection to map to the image plane then in the next step we count for the fact that where the sensor is mounted on the image plane and then if the third step we compensate for the other errors so here again the point so P or the point in the world where in the in the camera everything which has this over line over here is on the image plane was a sensor and the prime variables are those variables which take into account the nonlinear errors in there and again our assumptions was over distortion free lens the focal point of the principal won't lie on the optical axis which is actually not well I wanted to say it's not true but it is true because this is for the first set so this is for the idiolect rejection the first step we assume that this is the case and then we compensate for the fact that this is in reality not the case all rays are straight lines and path through the k1 and k2 so so we're exactly the same points and this defines the camera coordinate system and the distance from the camera origin to the sense of - my camera constant well there are two ways I can actually describe that so if I have the original of my camera I can kind of use the physically motivated model how this is actually done in reality so we have our point here in the real world the Ray passes through the origin of the camera and then map to the image plane and in this case the camera constant is in the positive Z direction of my coordinate system so C is larger than 0 or I take something which is often used in image processing and computer vision where we say ok we basically just flip the image from this side to the other side so then we have the original and the point P prime is on the negative side of this Z coordinate so on here so I can see that this point is actually projected onto this image plane then the image also stands absolutely not upside down as it would be the case here things work exactly the same way just once one point C is larger than 0 or C is smaller than 0 we actually stick here with this one again the equivalent is just a rotation of the image plane or of the original 180 degrees so did have had it first flip before it's or to it this is a rotation so I'm rotating like this but the rest the Mestre's executive the same so we use this approach with the camera constant smaller than 0 throughout this course okay just see is negative but depends if ya see is a distance or distance with the vector but we use see smaller than 0 okay so let's go through the projection and the projection is directly obtained through the theorem of intercepting lines or transits as we have done that already in the matching chapter for a single camera so we have our point in the we have our point in the camera coordinate system where this point is mapped to depends on the camera constant and it depends on the distance of the point of the camera so dead corners so if I go back one slide I basically have this example over here so this is my point P in the real world it's mapped into the P bar over here on the image plane and either so if this point move further away so it moves here the point will get a smaller x-coordinate the further it goes away or if I change the camera constant depending on where moves to this element gets larger or smaller and just through the theorem f intercepting lines I can compute that my image point is your the point of 3d world the x-coordinate of the point of 3d rolls divided by z coordinate times my camera constant this gives me the X and the same way for y the y coordinate and do the same thing for that but I didn't have that divided by that here which directly gives me C so the third coordinate is always mapped to C so and this makes sense because this is a distance from the origin of the image plane so what would be in 3d coordinates the third coordinate would be C but I'm actually and it because everything that all the depth values are mapped to my camera constant this is what describes this projection over here I can now express this in homogeneous coordinates so my 3d homogeneous coordinates I apply this one transformation matrix see see see over here which maps it to the new point and then I need to do the is kind of just this the scaling just to scale through fee and then what I do is I drop the third coordinate in here so this line over here because the Z coordinate gets eliminated and everything is mapped to C and then obtain this matrix which is the same matrix on this one but just the third line is actually missing so me evolve four-cornered vector over here which is projected to the 2d world so then I get three components here in homogeneous coordinates which define me the projected point in homogeneous coordinates in my camera coordinate system is it clear so what we have done we have done once the step which was scales down our image according to the camera constant and then we eliminated one dimension of course we map from the three volts to the 2d volt this is just applying if you know apply this transformation over here we get exactly this result so it kind of yes please yeah that's absolutely mistake this one should sit over here I'm sorry I'm very sorry so this was a latex mistake when I was making this situation this one must must must be here I don't know maybe mum yes this is copy/paste mistake I'm very sorry see if the copy of making that okay let's check let's check let's apply so this is the final one let's check what happens if I multiply this matrix with this vector over here so the first component will be C times X K P right I'll bang this one with this guy next one if this one with this one this woman this one so we have C times y okay okay P well this one with this one I obtained that now I think it's correct so I just get that K that's okay P over here which is my homogeneous vector X P see what if I now normalize this vector Vickers in homogeneous vector I divide by the last component so what I get is C X K P divided by that p k C times K X P divided by K that P and 1 and this is if we go back exactly the results that I have over here so I'm happy this that coordinate is gone due to the projection we just have those two coordinates which are the correct ones expanded by one so already normalized so we get exactly this result so what's written over here is correct on the one must be here not here in the same year so it's correct was written on here sorry for the recording sorry for the confusion at least now from this calculations you should have seen that this exactly gives me the result of the theorem of intercepting lines so it wasn't too bad to do the calculation okay any further questions okay so what we have we can again in the homogeneous coordinates Express this transformation quite easily saying my point from the camera coordinate system why are the matrix P which maps from K from the camera to the image which is exactly given like this gives me the point the projected point in the image plane okay so now I can actually combine that I can say okay given I have the ileal camera which just does the ideal projection I can now combine this matrix which the transformation we had before from the world coordinate system to my camera coordinate system I can say okay this is now P mapping from all from the world coordinate system to the image plane so it's the combination of the extrinsic and the projection that we just just have done which is given as the transformation we identified with of the rigid body transformation so this part over here and our matrix we just multiply those two guys we actually have the mapping from the world coordinate system to the image plane okay I can this is a rotation matrix this is three dimensional so this is their words transposed so this is a three elements and this is a three by three so this is a four by four matrix and I can multiply the 4 by 4 matrix with this guy but good that you race that be cursive also come come very very soon so and as a result of that we can actually express this matrix here and the calibration matrix for the EDL camera where only the camera constant matters but nothing else matters those thing which is called video camera and then we can actually write the combination of this calibration matrix and the projection and the M exterior the extrinsic s-- in one way so this matrix P as KC's or this calibration matrix and then kind of this weird form have you seen this weird form for writing a matrix with this bar over here the basically means we are just stacking together different elements so we are sticking this rotation matrix which is a three by three matrix with the vector this bars just written to kind of clearly separate the block matrix and the vector over here and this is just minus R times X 0 so it's the 3 by 1 vector I can also multiply the rotation matrix out of this then this turns into the identity and here is just minus X 0 so this one so these elements here are 3 by 4 matrices because this is a 3 by 3 matrix and this is a 3 dimensional vector also written that up here what that means this is exactly the same what's written here so this is the 3 by 3 matrix in this case the identity so this is this identity matrix over here and this is a vector just given by this vector over here this is what this notation stands for it it's very frequently used because you can quite compactly write matrices which are stacked together from different elements so it's just kind of just for making them tation easy I can stick the matrix with a back door okay okay good so what we have is exactly this form over here using this K is always the calibration matrix because it resembles the the idea of the calibration of the calibration matrix is to model the intrinsic parameters of my camera so this matrix will get more and more elements through the through this course for through the lecture today okay so given this transformation we can map any point from the 3d world onto my image plane expressed in homogeneous coordinates everything in here if you would write this down we actually would get this element over here so this is the calibration matrix this is my rotation matrix and this is multiplied by the identity and then so those elements already computed then I get this one over here so if I now turn this into you're creating coordinates that would be actually the result of the video pleading coordinates with these all the different elements that I have in here you're going to do the derivations all the indicating coordinates as it's done here you I guess more likely to make a mistake from Perth if you use all the homogeneous coordinates as in this form over here course they can the transformations can so nicely be chained which is one of the big advantages for really simplifying the mess over here this if you write it on a piece of paper here much less likely to make a mistake but these are so-called : e : e arity equations for mapping my point from the world coordinate system onto my for the EDL camera onto my image okay so now in the process this is done this is just what we just did so we have somewhere here Indian Indian trinsic and what we've said so far there's about the rigid Body Transformation this was our central projection and there are two things which are missing over here so we now will add two more things now find transformation then it gives us a so-called affine camera because the mapping here from the image plane onto my sensor is in the final transformation and then we have due to some errors like lens errors we have nonlinear effects or not many arrows we add additional here what we know I do actually move from here to here and I'll continue this process towards the end any questions up to that point if it's not the cases make a five minute break that some potentially warm by these fresh air in and then continue
Info
Channel: Cyrill Stachniss
Views: 21,789
Rating: 5 out of 5
Keywords: robotics, photogrammetry
Id: DX2GooBIESs
Channel Id: undefined
Length: 43min 13sec (2593 seconds)
Published: Thu Jul 09 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.