ENB339 lecture 9: Image geometry and planar homography

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay we might get we might get started this is the second lecture about image geometry so last Thursday we had a bit of a rave about the geometry of image formation we talked about three dimensions and two dimensions and I'm going to revisit some of the things that we did in that last lecture and then get through to one piece of information that you're absolutely going to require to complete the track all right it's the mystery of the blue dots why have you got blue dots on the sheet today we explain why you've got blue dots all right so to recap what we did in the last lecture we talked about how we can represent three-dimensional things that looks rida mention 'l on a plane right and it's all to do with obeying the laws of the geometry of image formation if you don't obey these laws things are not going to look very three-dimensional right so it took a long time for human beings to be able to figure out the geometry and draw things on a plane surface the Natale look like they've gone actual depth to them and now some street art versions that we showed talked about last Thursday as well and we can create we can form an image and we talked a bit about pinhole image formation so if you've got a darkroom where a little tiny aperture in the wall then you get an inverted image forms of the world outside it's called a pinhole camera some of you may have made one when you're a kid and the geometry of this image formation is really pretty simple it's all about similar triangles we talked about similar triangles but we also talked about the fundamental problem with a pinhole camera and that is that it creates images which are very very dark yeah I can stand up but I can't walk this is progress but the images are very dark because not much energy not much radiant energy from the world can get through that tiny little hole most of it smacks against the into the wall so we form an image using a lens a lens being a big chunk of glass allows a lot more light to come through so we have this thin lens model that you've probably met in high school physics and it allows you to create much brighter images but it's got a focus problem right it's possible that the we'll be out of focus you have to adjust the position of a lens with respect to the surface on which the image is formed in order to achieve a beautifully focused image same thing is you kind of generally kind of everything the image in focus so you can have one thing in focus and the rest is blurry you can adjust what's in focus be kind of everything in focus simultaneously so we talked about some alternative sorts of cameras light field cameras last week which allow you to capture the whole wide field and retrospectively focus the image anyone go out and buy a light field camera how much are they okay alright so there were some equations for the thin lens model and so a consequence of forming an image of taking three to three dimensional world and crunching it down so that it lies that you have an image on a plane we've removed the dimension right we've removed would come from 3d to 2d we've removed the dimension of depth and the consequence of that is that things look a little odd so lines that are really parallel in the world become not parallel and image things that are circles in the world become not circles in the image and this is so familiar to us that we don't pay a second thought we don't think that it's weird because we've got perspective cameras in our heads and this is the way things look to us but if you think about it it's a little odd and there's a consequence of losing a dimension you can't have a you can't throw away a dimension casually and not have consequences these are the consequences I want to touch briefly on some some really fundamental geometry stuff so again hopefully you know this from from high school and from your education here at QUT it's a really familiar concepts from geometry and we're just going to extend them a little bit so we have the concepts of what's called Euclidean geometry a Euclidean plane and that is a plane where the rules of Euclidean geometry apply and we have Cartesian coordinates so this is the idea where you've got an x coordinate this way and a y-coordinate orthogonal e to that and it was developed by a guy called Rene Descartes and Fei philosopher and mathematician and mercenary it's an interesting job combination and he was the first person to realize you could you could represent a point as to as to coordinates against a pair of orthogonal axis and the story is that he was lying in his in his bed one morning and he liked to sleep in and he was lying and he saw a fly on the ceiling and he realized that he could represent the position of the fly by its distance from the two walls and so that then came up into this concept of Cartesian geometry which is named after him so if we're doing Cartesian geometry in the plane two dimensional Cartesian geometry we have a point and it's got an X and a y-coordinate right so we can represent a point as a vector that goes from the origin to the point or we can represent it by two numbers its X&Y coordinate this should be very very familiar to you right so here's our point and we can represent it by two coordinates so we say it exists in the space r squared so it was represented by two real numbers that's what our squared means we can also represent it in a homogeneous form so we represent a point we just got through coordinates by three numbers and you might think that this is not very useful but I show you in a few minutes why it's useful we touched a bit on it last week and talk a bit more about it today so we we say that this number exists in this face P squared it's a projective coordinate a two-dimensional projective coordinate is represented by three numbers now week so we to convert a number from Cartesian to homogeneous we just append a 1 to it so we have x + y is a Cartesian coordinate the homogeneous coordinate is X comma Y comma 1 what we do is whack a 1 on the end now homogeneous coordinates I tend to represent with a little tilde on top so here's a homogeneous coordinate of a point p with tilde and it's represented my three numbers X tilde Y tilde and Z tilde yeah to convert it back to a cartesian co-ordinate I divide the first two numbers by the third number this is really important yeah we take first two numbers divide them by normalizing by the last number and we get the Cartesian equivalent so we can convert from Cartesian coordinates the homogeneous homogeneous Mac to Cartesian whole lot of stuff becomes really easy when you are when you think in terms of homogeneous coordinates and what's really interesting in homogeneous coordinates is that points and lines are almost the same we say that they're jewels and I'll show you cute cheap Trick's it's not examinable but it's really cute here's a line in homogeneous coordinates I can represent a line by three numbers I can represent a point by three numbers I can represent a line by three numbers so there's a line represent by three numbers l1 l2 l3 and there's a point so I can define the line as a set of all points for which the dot product of the line and the point is equal to zero so recall that the point equation of a line and it's advantage compared to the form of the line that you probably know y equals MX plus C is that this form of a line allows you to have a line that's vertical in that Cartesian representation of a line four lines vertical n goes to infinity it all gets a bit ugly in homogeneous coordinates we represent a line by three numbers you can represent a horizontal line a vertical line line at any slope at all so you can represent a line by three numbers as well now if I've got at one point representative in homogeneous coordinates another point represented in homogeneous coordinates then the equation of the homogeneous line that goes through them is the cross product of the two points in homogeneous coordinates this all becomes delightfully easy it's much more complicated to do this in the usual two dimensional representation of lines and points and similarly if we've got two lines represented in homogeneous form and that point that they intersect that is the cross product of the two lines so there's a gorgeous symmetry here between lines and points we can connect lines that go join two points points that are formed by the intersection of two lines as I say not examinable but just to try and show the power of this homogeneous representation of points and lines and when we do that so we can represent the pinhole model of a camera this thin like thin lens model by this homogeneous equation and so we have the homogeneous representation of the points in the world which is x + y + Z with a 1 on the end so this is a 3-dimensional Cartesian point in homogeneous form so I've just put a 1 on the end of it but that's the vector that's on the right we have a matrix in the middle with mostly EPS and noughts and a1 in it and on the other side then we have the homogeneous coordinate of the point on the image and that's represented by 3 homogeneous coordinates yeah and to convert from those homogeneous coordinates on the image back to the homogeneous coordinates back to the Cartesian coordinates on the image we use this relationship so what we've done is we've taken world point XYZ turns it to a homogeneous vector multiplied by a matrix and then converted it back to Cartesian form and we can we can model the effect of a camera this perspective projection that conscious 3d is a 2d so this is now we can represent the image formation process so there's the X and y on the on the image plane so we've eliminated the the sort of the the / by Z problem that normally comes when you represent these equations in Cartesian in Cartesian form now go back to this relationship that I just had before and we can we can factor that matrix into two matrices actually the matrix that's on the right does the scaling it's all about the focal length and you know with a camera nur's lens the focal length is big then it's going to do zooming it's going to it's going to zoom in on something as far away focal length is small it's going to take a very wide angle view things are going to appear smaller in the image and this is all stuff that we touched on on the last lecture that matrix converts three dimensions to two dimensions there's another way we can represent all this and it's what's called the central projection model so before we had a model where the light ray came in through a lens focused on a plane on this side it's quite convenient to represent it in a slightly different form where the Ray comes from the world pierces the image plane on its way to the origin so you draw a line from the world point to origin comes like this and it intersects the image plane the equations are exactly the same but it's quite common in many textbooks to write the image formation process using this central projection model the maths is exactly the same but it's called the central projection model that's the maths for it up the top there now if we use this central projection model and it's a little bit illustrative normally when the Ray comes in right it hits the image plane at a particular point and the image plane is generally quite small in a camera near the image plane might be a little square you know kind of that size if it's film it might be only six millimeters square if it's that if it's a sense of chip in your camera yeah it's a very little sensor that the image is formed on so if something's maybe two meters high in the world on the image plane it's only gonna be a few millimeters tall right it's it's it's it's shrunk so you won't get that okay in a digital camera system right the image is going to fall onto an array of pixels and we've talked about pixels before and arrays of pixels so what we're gonna do is consider that the image is now going to fall onto a grid and we want to know which grid cell does the Ray fall on because that's the pixel that that light rate is going to illuminate now we've talked about with images too that they have a different coordinate system with an image we have the coordinates 0 0 up in the top-left corner right and the the yuka ordinate increases this way and the vehicle and that increases down the image and we've done that when we've been doing image processing in practicing truths and and talking about it in the first few lectures so what we want to do is some transformation between the distance in meters at which through which the Ray lands on the image plane and its coordinate in terms of pixels right so we want to scale it from meters into pixels and we do that by using this linear transformation at the top so what we do is we take the the size of the image and we divide it by the height of each pixel and that converts the height of the image in meters on the image plane which is really small into a number of pixels and then we also want to do a shift of the origin from the mid of the image off to the top left corner and so that matrix up the top there does both of those things if you work in homogeneous coordinates it will both scale the image and it will shift it and so that the values there you 0 and V 0 are the coordinates of the center of the image yep so if your image is 1024 by 1024 then you 0 and B 0 will be 512 right so what we're doing is we're building up a number of transformations and we can stick them all together and what we end up with is this relationship here which I went over really quickly last time there are three matrices all in a row so we have on the right hand side the point in the world it's x y&z coordinate that were turned into homogeneous form then we're going to multiply by three matrices and we're gonna wind up with the homogeneous coordinates of the pixel in the image so these matrices and this those those two are called the intrinsic parameters of the camera they are things that depend only on the camera so and there's two parts to that there's one part which depends on the focal length like how much zoom your cameras got that's the middle matrix the matrix that's on the left hand side is doing this scaling that's that got information about how big the pixels are and where the camera and where the pixel array is with respect to the lens so those two matrices describe things that are innate in the camera a fundamental characteristic of the camera the next matrix along the one with the IRT and it says whereabouts the cameras in the world so the r matrix is rotation matrix that you might have covered with michael way back at the beginning says what's the orientation of the camera or which way is it pointing and T is where is the camera in the world so you think that the image that you form with your camera depends on a lot of things the image depends on where the cameras pointing where it's situated what the zoom what the focal length is and on the size of the pixel array right the image formation depends on all of those parameters and that's how it can be fact that now normally you don't know or so that they're called the extrinsic parameters they depend only on where the camera is not on the characteristics of the camera all together all those matrices together are called the camera matrix you can it's very very difficult to derive this from first principles so generally what you can do is you can perform some calibration process and you can estimate it you just have a matrix with twelve numbers in it it's a 3 by 4 matrix with 12 numbers in it you put in the three dimensional coordinates in one side and the other side comes the image plane coordinate so what I'm going to do is a little exercise for you seeing I need some pencil and paper and here's a camera matrix and I'm gonna give asking you to determine what's the image plane coordinate of the world point four zero zero so that's good x coordinate of 4 and the wine Z coordinates are equal to zero right if it's worth this during us at your at your at your leisure alright so that's the camera matrix and that's the calculation that you want to do if you're using MATLAB you would have got an answer you have got a three vector that looks like that all right but it's a homogeneous coordinate so when we want to convert it back to a Cartesian coordinate we've got to divide the first two numbers by the last number right divide them all by four and we're not then at that point in the world the point that's got an x coordinate of for y&z coordinates of zero appears as a pixel at the horizontal coordinate 712 and a vertical coordinate 912 so if the image was a thousand twenty four five thousand twenty four and megapixel image then it would be down in the bottom right hand corner right that's where that point would be projected to so there you go that's how you can model the projection of a camera and that equation that we had back here we're going to revisit that again in in a few moments so that's how you can do it in MATLAB now what's interesting is what I've done now is I've introduced an extra scale factor here I've introduced a lander in the front there so it's multiplying everything by lambda so if we do that then the U tilde and the V 2 and the W tilde will all be scaled by lambda yeah we agree on that so when we convert it back to a Cartesian coordinate because we divide through by the by the by the W tilde for lambda disappears so you can multiply that matrix by any arbitrary scale factor and you'll get exactly the same result this is a slightly unusual thing about homogeneous coordinates is that they don't care about scale the result will be unchanged and it's because that's the case then what we can do is we typically write it like this that these there are twelve numbers in that matrix but because the scale factor is completely arbitrary by convention we make the bottom right corner equal to one we could make it eager or anything but by convention we make that one equal to one so then there are only eleven other numbers that we need to that we need to figure out so that's a characteristic of homogeneous coordinates scale factors do not matter now in the in the very lovely MATLAB toolbox there is a function that generates a camera object and here I've generated a camera that's situated at position it with XYZ coordinates of zero one and two and then I've rotated the camera so that it's pointing along the x axis of the world so I've got a camera which by default points up that long a Zed axis I've shifted it and I've rotated it so that it's it's looking along the x axis of the world that's what those rotations on the end do and I've created a camera model in MATLAB and then I can ask it for the C matrix for the camera projection matrix and there is the camera projection matrix this is the one you just used in that exercise I can give it a point and I can call its project function and so it the project function of this object turns a 3d point into a point on the image plane I can do this in MATLAB I can ask it to plot the camera for me and plot the object and so you see a little tiny camera Kong and a little icon representing the point so I can see where's the camera which way is it looking where's the point that it's seen yeah so the I can visualize the camera observing the world in in MATLAB a music okay what we're going to do now is a simplification of what we just did and what I'm gonna do now is consider that my camera is looking at a point the points on a plane in the world and I can arbitrarily choose where the origin of my coordinate of my coordinate system is so I'm going to put the the origin of my coordinate system on the plane and the X&Y axis of my coordinate system are aligned are in the plane yeah so Zed is pointing out of the plane x and y are within the plane so that's that's it up the top there I can put the origin anywhere I like but I've decided I'm going to put it there on the plane so that means then for every point that lies on the plane it said coordinate is equal to zero get that so down the bottom then is the projection matrix that matrix with a C 1 1 C 1 2 etc and they can those numbers just depend on they depend on where the camera is looking and they depend on the coordinate frame yeah and now I've written instead of X Y Z N 1 I've written X y 0 and 1 because that is always equal to 0 in this particular example now if that's the case if Z is if that's got a 0 there then that's going to multiply out the third column of the matrix is it going to be x 0 I really don't need that column or that nut or that number there right so I can remove a column of the C matrix I can remove a row of the vector on the edge and now I'm left with a 3 by 3 system that matrix is generally called H it's called a planar homography and what it does is it Maps points in a plane to points in the image there's no depth here anymore so if I know that to the two dimensional coordinates of a point in a harbor trip plane in the world I can convert those just using that simple matrix into the coordinates of a point in the so this is going to be really useful to you mammography is going to be your friend by the end of the prac I've written the home ography here with a one in the bottom corner so the 3 by 3 matrix got nine numbers in it one of the numbers is equal to one so there's only eight numbers that we need to figure so it's got five it's got eight unknowns now if we've got a point on one plane and a point on the other plane that gives us two pieces of information so with four points four points on this plane and four points on this plane that I know that gives me enough information to be able to estimate this matrix H so here's an example of something you can do with this homography technique so I took a picture of not your damn cathedral as one does and it's big so you're standing there in the square you're looking up if not your dumb cathode will you take a picture like that right and so it's pretty distorted because I'm standing on the ground looking up at something that's pretty massive and you can see this distorted right it's this the sides are acting vertical but it's got this foreshortening effect right so it appears to can the sides appear to converge right but I know some things about the Cathedral I know that these four points here lie on a plane right they lie on the frontal plane of the Cathedral more or less yeah so if I pick those four points and I say okay they should they should be on a plane but and I can draw around I can draw it a shape through those and I can say that that shape really should be a rectangle yeah if I was looking at it properly that shape would be a rectangle so I've got four points in the front of the Cathedral I've got four places where I'd like those points to be in my ideal image of the Cathedral and so two of the plank points haven't moved but two of the points do move so what I can do now is I can estimate an emag rafi between one set of points and the other set of points four points that belong to one plane and four points that belong to the ideal plane the idea represent tation of the cathedral so in matlab is a function it allows me to estimate a tomography i've got the two sets of points i picked them off pick the pixel coordinates off with the eye dysfunction and then I figured out where I'd like them to be where I'd like to read the red dots to be and I used to mate the homography matrix and there it is now I can use this tomography matrix then to walk the image so what I've done is I've applied that holography to every point in the image I've straightened it up now the straightening is an entirely valid all I've said is that that frontal plane of the Cathedral that I know is a plane and I picked four points that should be on a rectangle if I looked at it from a cherry picker they would have been a rectangle and I forced them to be that way but it's going to introduce some distortion so for all the points that don't lie in the plane where I've done this business there's going to be quite a bit of distortion so if you look at this more closely you'll actually see that on the sides of the bell tower there's some distortion because those points don't lie in the plane I don't my photograph doesn't have the full three-dimensional representation of the Cathedral so there's no way I can get a perfect representation of the Cathedral from a different viewing point this is the best I can do but it's only correct for those points that line that frontal plane of the Cathedral everything else will be somewhat distorted but it's not a bad trick so in MATLAB I can do that just using that function there called home warp so warping is an interesting business it's used a lot in in special effects and what I've done here is there are the two images the output image on the left and the input image on the right now what I've done is for every pixel co-ordinate in that left-hand image every single pixel coordinate I have paths I've multiplied that by the homography matrix H and figured out what the coordinates should be in the input image so I've just given one example I picked up a pixel of the top there at coordinates 600 comma 100 I've passed that through the homography matrix and I end up with the result 757 and 50 1.3 so it's saying that the pixel in the left image belongs at that fractional coordinate in the input image right I can't look up a fractional coordinate an input image but I can interpolate it I look at the neighbors I look at the integer coordinates I then look at the neighbors and I do some interpolation some weighted averaging to figure out what would be the value at 757 comma 50 1.3 I do this process for every single pixel in the input image all right I take its coordinates put them through thermography do the interpolation and then set the pics from the output image so it's a little bit time-consuming but I reasonable computer it goes like that so what I've done is use some very simple geometric information this notion of an tomography I've used my domain knowledge about cathedrals and planes to be able to do this trick and rectify the image what it's equivalent to is really moving my camera from the blue coordinate frame on the ground looking up like this to a virtual camera which is up in the air looking at the Cathedral straight on and actually from that homography matrix I can actually decompose it in some ways and actually find out what was the angle that I was looking at the Cathedral at so after I've come home and I've got this image I can work out what was the angle at which I had my camera tilted up above the ground given that the front plane of the Cathedral is it's vertical it's an old cathedral so that may not be a valid assumption so this is some some simple information that you can back out of just a digital image okay Olympics were on recently and sit down the television you see pictures of swimming pools and you see amazing things you see world record lines moving down the swimming pool you see national flags and names of swimmers draped over lanes right it's the demography trick so there's a swimming pool and there's a national flag and a name right it's a nice big rectangle and I want that to lay over that so the name is a plane and is a plane in the pool between the two between the two string ropes so I can map those points to those points I can commute kink I can compute an tomography I can warp that flag and name banner right into a set of pixels that'll be kind of twisted and late and then I can overlay it exactly on us from important on this ring for what serene port lane so that's how that's done if you know the geometry of the scene and everything's a plane your life becomes very very easy and that's that there are many cases like that you imagine a surveillance camera looking down at a flat car park right if you if you compute the holography then from any point that you see in the camera you can work out what's its co-ordinate within the car park so there's many cases where cameras are looking at environments that are a plane which brings us to the prac so in the prac way long time ago you build a robot right that worked on a little sheep it worked on a plane yeah and you took your taking pictures with a camera and that results in image being formed on a plane on the image plane within the camera so here we have the camera image it's got a skull witness U and V and there's the worksheet that your robots going to sit on right and it's got coordinates x and y and in the camera image we see some rotated twisted skewed version of the sheet that the robot sitting on we've got two coordinate systems you've got X Y the coordinate system that your robot understands and we've got the UV coordinate of things that your camera can see so you already know how to figure I think that the U and V coordinate of a blue blob or green blob or a red blob you know how to do that yep the problem is how do you get your robot to reach that because they're in different coordinate frames all right so the answer is mammography so we have a point in the image I've got a put P corner C U and V another point on the robot worksheet I'm going to call it Q it's coordinates x and y and I can compute an emag Rafi but maps are p2 q or or the other way around interesting thing with their ma graffiti you want multiply by the inverse of the mole of your matrix and it does does the opposite translation the opposite mapping yeah so let's say we've got four points in the image plane P and saw a switch that should be Q right that four points X's which are in the robot plane I put those coordinates into columns so I build up a matrix with four columns the four image points before world points image points are measured in pixels the world points are measured in millimeters right what's really important is that the points correspond right so the first column of this matrix u 1 V 1 has to be the image coordinates of the point x1y1 you get what I mean by correspondence yeah they have to be the image plane coordinates of that particular point in the world if you swap swap those around bad things will happen so on those sheets the worksheet that you've looked at so far there have been nine blue dots right these are the calibration dots so in your software if you can find these nine dots right you can measure the the coordinates of the nine dots with a ruler on the worksheet you can find the coordinates of the nine blue dots in the image once you've gone those two sets of data you can estimate this homography matrix once you've got the homography matrix then you can map the coordinates of the Green Square into XY coordinates that your robot can reach now I've put nine dots on the sheet you only need a minimum of four theoretical minimum is 4 but you get a better estimate the more the more points that you use so that's why there are nine dots on the sheet yep so this is a really important part of a lab right you're going to use the nine dots to estimate demography between the camera image the robot worksheet and then you're going to use that arm ography then to map all those other shapes into the robot workspace and then you're going to use the robot kinematics to reach to that xy-coordinate and put your position your pencil in the middle of the shape that I asked you to reach so is that all clear so the MATLAB functions that you need to do this there is the homography function it's in the toolbox got a help page there are two matrices so P is the matrix that contains the image points X is the matrix that gives you the the robot the workspace coordinates and the home trans function then maps image points into into robot coordinates you've got to get them the right way around if you compute the home ography not from P to X but from X to P you'll end up computing the inverse thermography which will map in the opposite direction so when you compute the holography as it is at the top that's the holography from P to X from image to robot and you then apply that homography to image and it will give you robot coordinates all right so summary of what we of what we've covered we've talked about mapping points from 3d to 2d we talked about how easy things are when you use homogeneous coordinates and always remember when you do these things the result is a homogeneous coordinate to always need to divide by the last number divide the last number into all the other numbers throw the last one away to get back to a Cartesian coordinate this trick from homogeneous and Cartesian applies to two-dimensional Cartesian numbers so XY becomes XY 1 or so the 3-dimensional Cartesian numbers XYZ becomes X Y Z 1 the toolbox functions do this transformation this conversion between Cartesian and homogeneous automatically for you you're not going to see it but you might see it in a pencil and paper exercise in a tutorial or an exam for instance homogeneous coordinates a scale invariant that's a very important property and we can map points from one plane to another using this thing called and homography matrix so that's the some
Info
Channel: Peter Corke
Views: 136,752
Rating: 4.9562311 out of 5
Keywords: image geometry, pin hole camera, thin lens, camera matrix, homography, homogeneous coordinates
Id: fVJeJMWZcq8
Channel Id: undefined
Length: 35min 22sec (2122 seconds)
Published: Tue Oct 09 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.