Lecture 16: Stereo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so this is going to be a short lecture it's a very simple idea and this is about the stereo so as we were talking about yesterday that one of the basic step in understanding images in video is to recover the shape from images of video and these are the different methods which essentially recover the 3d shape and also in terms of motion it you know compute 3d motion the rotation translation of the objects or the camera or both so this works either you are given one image or your green multiple images so there are many methods stereo is one we are going to talk about today in motion we talked about yesterday talked about the structure from motion methods and shading which will require one image in photo Matic stereo texture contours and so on there these are different ways to exploit this 3d information to recover that so the share from stereo is very intuitive and most of you know that you have two eyes you get two images the image in the left is bit shifted compared to right image and that shift our disparity gives you an idea about the depth depth means a distance from the camera the object closer to you will have smaller depth as compared to object which are farther away from you and also when you go to watch movies with those glasses the Polaroid glasses when you watch this IMAX movies which are in 3d so that essentially is using stereo that you get two images and you will actually if you don't use glasses you will see them as like that okay we'll be kind of support most two images disparity and when you wear the glasses then you are filtering the one of the image on this eye and filtering another made from that eye so you get two images and shifted and you have capability to fuse it and you see in the 3d so that's the whole idea so in this case you know we have these layers and these you want to you can you get two pictures left and right immedlately look like that and given these two images you want to write a computer program which will recover the depth that distance from came in out to each of the pixel it's called a depth map or it is called disparity map and that's what the stereo is about okay so so given these two images these are from the books from Zaleski then you will get a depth map right like that so it'd be image every pixel you will have the number which will tell you the distance or disparity of that pixel distance means how far it is from the camera if you look at these are closer to you as compared to this so this is darker than this one and that is the 3d information which you get okay so now first thing is we have to show the geometry that is it possible to recover the depth from two images and the geometry is very simple then we will talk about you know how we will write a program to actually do that given two images so we have a point in 3d which we call W and it has coordinate XYZ okay and we have two cameras camera one camera two and then the separation between these two camera is B which is called baseline and like you have basically two eyes there the cameras there's a distance between them and so the image in the left camera is formed here because the array will hit this pixel go to the lens and the middle form here and the main right will form here okay and this is the focal length of both camera the distance from the image plane to the center of the cameras so now this is the image coordinate X 1 and left this is an X 2 and the difference between them is called disparity okay and these are the definition B is the best line if it's a focal length C 1 and C 2 are the camera centers and x1l to a remote location left and right cameras so given this setup then we want to find out how we can determine Z which is a depth which we want to find knowing the where them is from left environment from right knowing the focal length and knowing the baseline we want to find out the depth okay and it's very simple what we are going to do we are going to look at these two triangles one is the big triangle which is W X 1 X 2 and as a small triangle which is this one W c1 c2 and these two triangles are equivalent so we are going to look at the corresponding size suppose this side this altitude in this which is Z plus F which is a depth plus F and then corresponding side altitude and this smaller triangle which is Z so Z plus F upon Z should be equal to if you take the base of this big triangle which is X 1 F plus X 2 X 1 plus X 2 n plus best which is B so we have X 1 X 2 plus P which is this thing should be divided by the for the smaller triangle base of that which is B so these two have to be equal cut their equivalent triangle okay so that makes it easy because now we know x1 x2 from the image I'll tell you how to find that the P is fixed because for this camera out there fix that how much they are separated and then Z we can find out and F is also known so from here you can manipulate this and you can show that Z is equal to F focal length baseline B and X 1 plus X 2 which is the disparity and that we will call D our disparity so as you see that depth is inversely proportional to disparity the displacement distance between the image and left and right and that's what you know you shops you saw earlier that these this image which you will see when when go time X movie without the glasses you will see something like that mana mates popular element still separated and depending on this parity you will see the depth how far that is so that is very simple you know geometry and it is used to do this using a computer program ok so here are some examples we have a 2 pair this is normally called stereo pair to image the left and right and this is the depth recovered so another pair this or depth recover and then once you have the depth you can synthesize an image from different viewpoint and that's why it's very useful in in the computer graphics and so on so one thing is that we are going to assume that these images are rectified which means these images are like that so that we can match at this point go to the exactly same line in the next mate we finds match so therefore in this matching will be in 1d you know there won't be any displacement in the y axis and only displacement x axis and typically when they are not rectified aligned so we would rectify them and we will align them and that's what we talk about the one application of fundamental metrics is to rectify these images okay so we've now on what we'll assume they are rectified we'll just find the match in the same row so here is the example so we have a left image in the right image so let's say we want to find our disparity of this pixel so what we will do will take a window around here we go exactly same line in the right image we'll find out where is the best match of this but it does look very exactly same then the same point and you find how much the displacement so we look at here we look at here we look at here and maybe we'll find the best match here so this plot shows you the disparity our displacement from this which is exactly here whether this way this way or this way and then given a disparity what is the difference or error between this window and this window if there is a match then error has to be very small zero if there is no matter air will be big so therefore we look at different possible disparities which is these different locations and whichever give us the smallest error then we say that's a match and that's a disparity yeah yeah I'm going to explain to you so it will be basically difference between the window these pixels in the pixels which you are matching here pixel by pixel difference between this and this and you square it and submit and that's the error okay it's called sum of the squared differences so so then once we know the disparity then we can find the tapped and that's the idea it's pretty simple very intuitive to do that okay so now that's a you know the next thing what you ask so now how do we find the match how do you find that this window the left matches with the right which location which is the best so then there are methods called correlation based methods and essentially we can compute the disparity for each pixel finding the correlation and there are many different measures one is called the sum of squared differences and we have talked about a little bit before so we'll take the right in pitch window was one left pitch and we will subtract pixel by pixel square it and some that will give you the error okay it's called sum of square differences or instead of is calling the difference we can take absolute value and sum that up that is called sum of squares sum of absolute difference but instead of you know finding the difference in squaring we can multiply so if they are similar the product will be large if they are not similar then we small this is called cross correlation okay then we have normalized correlation which is similar to this one but here what we are doing is normalizing from the here we do the cone assembly left and right but then we normalize with the total sum in square it's so that we can always get when they are exactly same we always can get so then we have another way where we will take the mean which is here the mean of left window mean of right window and the standard deviation of left and the right these are the meanest renovation and we subtract those means from each of the corresponding mean from left and right then we do the correlation and divide by the standard version so these are different measures and they have definite advantage disadvantage but just I want to let you know that you have a choice what other things you can use okay so in general the correlation is very important concept as we have been talking about in the edge detection applying these masks and computing derivatives and all those things so correlation convolution on these are related so so it can find that similarity and dissimilarity between windows or patches and this is you know these are different ways to do it as I just explained to you and so now this correlation not necessarily not only used for the stereo but it can actually be used for other things and the the this correlation is not only can be used using intensity value or color but you can use the laplacian Gaussian output and do the correlation on that or instead of the intensity you can apply for the gradient magnitude get the grain vector Delta pixel and match those so there are different variations of that and they have advantage disadvantages but the notion the basic notion of correlation is very simple you take a window here up here go to the next image take its course on the window and multiply pixel by pixel end it up at the correlation or subtract picks up by pixel square the difference and then add it up and so on that gives you how similar or how dissimilar two windows are okay so there's another thing called mutual information priscilla idea where you look at the window and find the distribution and joint distribution of both windows and then find the marginal distribution of the left window right window and use that to say how similar how much mutual information between these two windows so so this is pretty interesting work and that's a basic it really works and has been known for last you know more than 50 years and you know it's used in many places so now as we said in the stereo we have to search only in the same line same scale line but we can use the same idea to basically find the optical flow as you know the optical flow also find the displacement and in that displacement can be in x and y that's why we have UV so this block matching you can choose essentially for computing optical flow and this is also used for the MPEG then when they do compression that's where the motion vectors are Jews also so impact motion vectors optical flow stereo of course when you do we have only this personal X so it's a simplification special case of the block matching so the the idea is that we have a block or a window in MPEG each block is a PI it and then we take this block in the right image for example and then go to the left image and then we look at possible matches everywhere in the sixteen by sixteen window find where is the best match okay so we look at lots of places there's general black matching not necessarily still you when steady will just look at in this room so I am talking about that so now you know we will have these eight pixels you know so if the origin is here so we will have you know 0 minus 1 minus 2 minus 3 minus 7x and similarly minus 1 minus 2 minus 7 by and then we will have possible displacement in X from minus 4 to 4 and plus 4 to 4 in y direction so you can write down these mathematically so the step process is that you take each 8 by 8 block Center on this particular pixel X Y and go to the right image and then take a 16 by 16 block and find the match there and compute the sum of squared differences and look at all possible blocks and then pick the one which has a least SSD and that displacement wherever it matches that give you basically the applicant flow known the displacement from the XY X Prime and that's optical flow and in the stereo it will be just a specimen X so so this you know again showing you that in order to do that say sum of squared differences you will take the K frame and K minus 1 frame if it's a motion one cop tickle flow this will become left image and right image and you want to find the pixel by pixel difference so this will 8x8 window so these indices are from 0 to minus 7 and so these are 4 IJ and then in this image you are going to displace it by U and V so you will take one particular value of displacement UV find the value to take another value of displacement UV find the value look at all these values and pick the one which is give you the minimum sum of squared differences so this is the definition of odd min which means saying you we will vary you will vary from minus 4 minus 3 minus 2 minus 1 0 1 2 3 4 we will vary -4 -3 -2 and so on and so you have lots of these possibilities while each possibility you compute this thing and then pick the one which is a minimum and correspondent that find that the specimen that's answer so it's a very simple way to describe essentially this whole slide in a formula and that's the idea of a mathematics that you can compact description and once you understand what every symbol means okay so then we have you know different versions of this absolute difference and maximum matching pixel count so here what we do we look at pixel by pixel different say well look at the pixel and Kate frame which is like a left image in the right image and find absolute difference if the absolute difference is small then we say one otherwise zero so we then count how many of them or ones and that gives you that how many are similar if this is less than some threshold between they are similar so there's a match so then we have another summation here where we made sure how many our pixel are similar and that gives you the match here we are looking at a similarity in the SSD we are looking at the dissimilarity so there you are minimizing here we are maximizing but they're exactly the same you can put a minus sign here you can do the minimizing so the same thing okay so so the same kind of thing cross correlations like that instead of subtracting we are going to multiply and as I have shown you before that from the subtraction you can actually drive the correlation because a minus B whole square is a square plus B square minus 2 a B so if I look at that esquire and B square does not contribute any correlation the only contribution is a B and a minus B whole square is equal minus you know to a B so you minimize a difference or minimize minus 2 a B or maximize a B so that's the correlation and this is the sum of square differences is pretty simple simple idea so so it ends normalize correlation and all these kind of things you can look at that so now we want to talk about one stereo algorithm which is different from correlation and this is what you are not implement if you want to do this bonus program and this is very interesting and it uses this algorithm called simulated annealing which can be actually used to solve any nonlinear program nonlinear function you know find the maximum or many more nonlinear function okay so this was proposed by a researcher called Barnard and so you want to in stereo you want to look at a similar intensity okay now it's you know similar the brightness constraint now that's what we are looking at the optical flow and also we want to come up with that disparity map PK because this pairing map as I showed you is like an image so that image should also be smooth like the objects are smooth like we say optical flow should be smooth so disparity map should be smooth okay so so the function he will minimize is this one is that the take a right image and left image and find the absolute difference between these two in a small neighborhood here's the 3x3 that should be smart that's the same thing as we will talk about that if you want to say this mind matches with this then there should be very similar and that this is the disparity that how much you want move in X direction in the other image then in addition of that we put the smoothness constraint that the gradient or the derivative of the disparity map which we are okay now this is ultimately we will get given two images we are going to get a DX Y every pixel we will have disparity and so we want to put a constraint that that derivative of this disparity map which is again a different between these two disparity map consecutive pixels should be small and the difference should be smart that's it that's a function we want to minimize for total all all of these pixels so now now you can you know look at the exhaustive set you say well if I take this pixel then is as we are doing in correlation is it you know is it small other if I look at this one is small and small you can do the search and that's what we are doing so but now when you want to do the global search because you have lots of pixel in the image and let's say every pixel can have maximum 10 disparity values no disparity value you know 0 1 2 3 4 and minus 1 minus 2 and so on so these are disparity pluses moves right negative means move left it can be both way so then for if you have an image of the 128 by 128 image then each disparity each pixel can have 10 possible values which means you have to look at a 10 to the power 16 thousand different combination of this parity map that's a huge huge search space because you want to impose this this this constraint with smoothness constraint which is on the whole all the pixels okay so that's why this is not possible to do that so that's why this guy came up with that well let's use a new idea of using this simulated annealing that we can get this minimum quickly and this is very important is this simulated annealing is using lots of many application and so basically that is very simple very interior and that's what you are going to implement it okay so what is simulated annealing so it is it is similar to the physical phenomena you know when you have metal what you do you you increase the temperature so all these molecules start moving electrons and so on then slowly you learn the temperature and then you get a very nice annealing and it's a physics you know that that's what they do in physics to understand these different material constants so he this is used same thing in actually minimizing a function so what we want to do as you saw that we will come up with the disparity value for a pixel and that we will call a state of the system okay so so now we don't know what the right solution so that's why we will say let's start a random case say well this is the disparity so we will say select a random state which means we'll select the disparity and then we will select the temperatures annealing because that's why there's a temperature notion so then so we have first state s you know any random number generator and then we select the another random number again s Prime they have to be different because they're random then we will say well if you take the state s what is the energy which is the energy you know as I'll show you here which was here okay this is energy because it just depend on this parity and so so we'll have energy for the S DXY you know where we have s and energy for s Prime we find the difference okay so then we say that if the difference is less than zero then we'll take the new state s Prime okay if the difference is not then we will generate this probability which is called P and there will be exponential e to the power minus Delta e which is a difference divided by temperature okay so so in the beginning will be high temperature so there'll be lots of random moves but then slowly we'll use the temperature so we'll get a probability and then we'll generate another random number between zero and one because the probability with venusian on one so from here we'll get number from zero to one and from here we'll get number from zero to one and randomly we'll choose the wrong state because this function may have a local minima so we may not always get the right one as we follow this here we will take a wrong step that even though energy is know going up we'll take that randomly that's why we are generating random number here so if X is less than P we'll take the new step okay and if no then we decrease if there's no decrease in several iteration in energy when we lower the temperature we keep repeating so that's the general idea of the similar density it's very simple and you can use this for any function you want to minimize as you know that when you have nonlinear function it's very hard to minimize that there are lots of difficult problems there so so but if you have and especially the functions for which you don't have analytical expression as we have been talking about say if you have a function you want to minimize you can just differentiate the equal to zero but this is only for function which are continuous and for which you have analytical expression but they are function which are not we don't have analytical expression so you have data and we want to minimize that so that's why you you will use these kind of methods so that's it you know that's that's basically is the method and the results are the left image right image and you will get depth like that and the left right and the sediments a ball sphere so now in general stereo is very active area of research you know there's lots of lots of papers there's a data sets which are available and this is from the unis in Japan and this is one of the image and this is the crown truth they can find that disparity that went far maybe range laser rangefinder it will give you however even Kinect will give you the depth map so that is ground truth then this will be something like we will get from the correlation based matching no and this you can compare with the ground truth you know still it has some problems because these are you know some energy and this is the one of the best method few years ago which gives you very good results and which is very similar to the ground rules okay so the benchmark there is you know standard results and so on people have been looking at that so stereo once you have once you recover the depth of disparity you can do lots of things there are lots of applications so here's I made this is from Zaleski book which you have access to so this is the image and this is computed spirity map and you can visualize it from different viewpoint this is another example where you have two images and you can synthesize the image in between them this is a synthetic what looks real little in the sky was looking at there then this one then this one then you can generate a 3d model of a face and synthesize a new image which is artificial or you can do virtual reality you can superimpose a real with the virtual and you can do lots of things like come with 3d model of the human like that so many many more applications but this gave you a basic idea of what is the stereo the geometry which is very intuitive left and right came in are the best line focal length and depth and also very simple logarithms one is correlation based which is matching and other is the simulated annealing which is you know should work better than the coronation and which is doing the search and very complicated function and the idea there is that in order to find the minima of a function one way will be you start with somewhere and you take the step and say if it is going down keep going that step and you will get a minimum that's a color gradient descent okay so I have a very nice description of this in in my book these are different sections you should look at that including the similar annealing and these pictures and of course there is a whole discussion on the Zaleski's book chapter 11
Info
Channel: UCF CRCV
Views: 29,239
Rating: 4.9047618 out of 5
Keywords: education, stereo, Barnard, disparity, simulated annealing, depth
Id: jzis4WE3Vc8
Channel Id: undefined
Length: 33min 4sec (1984 seconds)
Published: Tue Nov 20 2012
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.