CVFX Lecture 13: Optical flow

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

so today's lecture is about optical flow so optical flow is probably something you may have heard of before because it's a commonly used term and it's also a huge area of research in computer vision and so today for better for worse it's mostly going to be kind of like me deriving stuff on paper I'll show you a couple pictures but fundamentally you know the output of optical flow and the process of an optical flow algorithm running is not really that exciting of a thing to look at so mostly I want to concentrate on what are the big picture methods for how people estimate optical flow and then you know all this to say that people working on this for so long that there are a couple of major ideas that I want to talk about today but then to really get great optical flow requires a lot of nitpicky careful attention to detail that you know many researchers who propose little modifications here and there I'll go over some of those but to really get into it you have to kind of dig down a little bit deeper that I'm going to be able to do in one overview lecture but first of all let's talk about what is known as optical flow meeting right so remember that the theme of this chapter is what I call dense correspondence right so for every pixel in image one we want to know where does that pixel go in image two right so that's like saying that at every pixel XY I have a vector and that vector let's call UV that points to some other location in image two which is at point X plus u y plus P and the idea is that if this is image one and this is image to that image two at this location the pixel intensities should be equal right that's like saying that this pixel and we assume that this motion vector is kind of induced by underlying camera motion and possibly underlying object motion right so those are kind of the two things that can cause this apparent motion vector and so in some sense those two things are generally kind of up together in this motion vector now oftentimes an optical flow we think about instead of these just being like two arbitrary images taken at different times it's often instructive to think about these images as basically literally samples in time for example from a video sequence and so another way of thinking about this problem is we could call this image I at time T and this image I at time T plus 1 so it's basically like this interval of sampling and then another way of saying this is that we could think about the image intensity as a function of space and time so it's like saying that X Y at time T this whole pixel moves to this position T plus 1 X plus 3 y plus B right so in this sense I've kind of taken out the i1 i2 and I just said that there's this one free dimensional quantity I that depends on both pixel location and time okay and so again just like for feature detection most optical flow methods are defined or are kind of derived in a grayscale world all everything that I can say could pretty easily be extended to colors so here I'm kind of assuming there's a single channel of intensity right grayscale intensity but if I wanted to I could also assume that these were like three vectors of RGB I want those RGB vectors to be the same right so today I'm going to talk about the general problem where there really aren't any major constraints on U and V in theory you know all things being equal any pixel could go anywhere else now we're going to talk about the stereo problem after the spring break where things are a lot more constrained it turns out that if I have two cameras that are basically rigid about rigidly mounted to like a bar and I'm moving that array around so they're taking basically two images on the same scene in exactly the same time just slightly offset in that case there are some pretty severe constraints on where the correspondences can be and we turn things from being a kind of a 2 dimensional problem estimating UV to estimating one parameter for each pixel which is basically called the disparity so we're going to talk all about that next week after or not next week the week after next after Spring Break so that constraint is called the epipolar geometry and the problem is called stereo okay so we'll get back to that in a while so for right now we're going to assume that we just are estimating the specter UV at every pixel and that's that's the story and so this here is kind of the key assumption that we make in optical flow this is kind of called the brightness constancy assumption right and this kind of makes sense this is this is saying that fundamentally if I move the camera if stuff moves in the scene that colors on the surface of the object cannot change okay now that's in the real world not really always true and we're going to talk about some kind of relaxation of this in a few minutes but this is where we start from and actually you know this also kind of ties up a whole bunch of other assumptions so like if you're a computer graphics person you may have heard of like the Lamberson assumption this basically says that you know things don't fundamentally change color depending on what direction you look at them from you know in the real world these colors may change a little bit but you know the way to think about this is that we want to make this as close to this as possible okay and so let's take - the very first optical flow algorithm you know the first kind of formalization of how to solve this problem was by horn and shock and so horn is like a very famous computer vision researcher make sure I spelled shunk right and so this is often referred to as just HS okay and so the idea is what if we were to take this equation here and expand it as a Taylor series okay again Taylor series basically saying that I've got a function that is you know I want to estimate what this is and I know it's close to this right so it's like I'm expanding this around U and V where I assume that the motion vector is relatively small right so if I expand the Taylor series of the brightness constants the assumption in the equation well let me look back at this for just a second so that's basically saying I'm going to have the derivative of x multiplied by this Delta the derivative of Y multiplied by this Delta and the derivative of T multiplied by this Delta which is 1 so that's like saying that di DX times u plus di dy times V plus di DT times 1 that's the increment the kind of constant term is basically saying that I get to you know X plus uy plus V T plus 1 by taking what I had here and adding it to this and so this is the brightest constant assumption and I can see that these two things are going to cancel so what I get is basically something that involves the gradients of the image in both space and time okay another way I think about this is that fundamentally what I have is I have the gradient of I right this is like the vector with these two partial derivatives dot with the flow vector equals this number okay so basically at every pixel I have this one constraint right this basically tells me that I know what the flow vector direction should be when it's projected onto so this basically is telling me I know you know what the U V should be in the direction of the gradient right but what I have here is one equation in two unknowns right I have you know U and V to estimate and I only have this single equation in U and V to get my optical flow so basically this means that I have an under constrained system right this is only one equation in two unknowns and so that means I need more constraints so you know that's like saying that the optical theme of the brightness constant the assumption that colors don't change is not enough for me to estimate the entire flow right and so you know one way to think about why that's true is that if you imagine that I had you know a flat block here in a flat block here I could have the flow vectors pointing any which way between any of these flat pixels and these flat pixels all those would satisfy the brightness constant of sumption but clearly some of those flows are better than others right I want to flow where the motion vectors are not like crazy you know mismatched right so the intuition it's kind of similar to the intuition that we had for the batting problem is that pixels that are near each other should have similar flow vectors right so we're going to kind of impose that assumption and that again makes sense because if you think about it you know that's generally true almost everywhere in the image so if I'm moving suppose I've got no moving object in the scene I just move the camera as I move the camera the motion vectors of all the points in the background are kind of going to move smoothly right so I'm not going to get you know motion vectors that totally disagree with each other for camera motion and again even if I have object motion so say I've got a person in the foreground who is kind of moving simultaneously so in that case most of the flow vectors are again and have this smooth assault right all those all the vectors on the moving person will kind of be roughly the same they'll be smooth and all the vectors on the background will be smooth you have some issues where there's the interface between the person in the background in that case you know a pixel on the person that is right next to a pixel of a background may have very different flow vectors right and so the one case where this doesn't hold one example is basically foreground background boundaries okay but generally we can be safe and imposing some sort of a smooth this assumption everywhere okay and so that means that what is our other constraint going to look like well you know one thing we might want to do is imposing smoothness you know we might want to minimize something like this right so it's kind of like saying again what I'm estimating for optical flow is for every pixel I've got a UV Direction right so excited saying I have to a sec say I'm a vector field over the domain of the image and this is saying that you know the U values this is like saying what does the change in the U values and X and the change in U values Y and the same here so you know this idea is I want this to be small that's like saying that I can't get any big changes where I should get very many big changes in the flow vectors okay and so if I put this all together the horn honking is basically to minimize basically the following thing I have a cost function that looks like overall pixels in the image I want to take this number which is related to how well the pixels fit the brightness constancy assumption and then I want to add some factor times this number so one way I think about this is that this is like saying that I want to minimize something that depends on the data and something that depends on the smoothness and what I'm doing here is I've got this factor lambda that kind of trades off these things right if lambda was huge then basically that would be like saying well above all else I want the flow field to be smooth and so in that case if I didn't have if this trip didn't play a role at all I basically have a constant flow field that would be the smoothest thing I could get if I make this lambda zero then I have the chance of getting things that agree with the brightest causing assumption but aren't smooth at all so what I want to do is I kind of want to choose my lambda to balance these two properties and so this kind of this kind of formulation this kind of thing is often called a regularizer right kind of means that I don't have enough constraints so in the absence of constraints I put this kind of smooth this term on and I have to decide how do i turn my lambda to make the field smooth right and so again it's kind of a black art to how do I choose this lambda this lambda may change over the course of iterations that I used to solve the problem but this is the basic idea okay slowly pause and ask are there any questions about the set up yeah well I mean ideally the dated term is zero but in practice when I have real images it's not going to be exactly zero right I mean ideally we want this to be exactly zero but it's not going to be right so when I when I calculate my best U and V the best I can hope for is that this is going to be small right because you know one of the reasons is that real images deviate from the brightness constants the Assumption right so things just due to pixel noise not on version surfaces you know lighting changes all that stuff will contribute to non ideal conditions like her I think you have to interchange that the partial of the what are you d I yeah yeah oh yes you're right I wait wait you're right this this has to be this is bad right you can see why yeah that's a trick but now in dangerous London in the data yes sorry yeah good good cash is fine I got it right yeah yeah this this stuff you know unfortunately there's all this kind of like using these exes why is it so my apologies and so this is the fundamental idea I mean this is this idea was presented back in the early 80s right and so all this stuff can basically be estimated from the image right these things here are just gradients of the image this year is also like a gradient but just taken in the time direction so I can compute all this stuff from for example just planet differences of pixels these things here right again since in the real world I'm estimating a I'm estimating a flow field that is discrete right so again these are going to be like client differences from the U and V fields so they're really nothing here that I can't do in a discrete world right and in the book you know I kind of mentioned how you would kind of set up the discrete versions of these quantities right so you know fundamentally with this we come down to is solving some big linear system over all the possible UV values right so you'd have a big matrix that would have lots of values from the two original images your unknowns would be all of you V vectors and the other right hand side that would come from probably this stuff and then you'd solve that linear system okay so there are some good things and some bad things about this one is that the nice thing about having a regular izing term like this means that you know without getting too mathematical the closest way I can say it is it it kind of reminds you when you look at how it's actually solved of a kind of a plus online equation in the sense that this is something that will do a good job of filling in things smoothly this is like a diffusion equation kind of system it's basically saying that even when regions are flat if I have good constraints for how stuff move around the region those constraints will kind of force all the stuff in the middle where I may not have very solid constraints to kind of follow along right so this means that I will get a very nice smooth flow field one of the immediate problems though is that this expansion this Taylor series expansion remember from your calculus only is really valid when the increment this UV step is very small right so there's this underlying assumption basically says this will only work when the motion vector itself is very small to begin with which means that the two images have to look almost exactly the same to begin with we know that that's not going to really happen in most flow problems right so for example for this to work you really would want for example U and V to be less than a pixel right and of course if there's also the pixel motion then you got basically no motion at all so one way we can get around this is to make the method kind of hierarchical so and then this kind of idea generally applies to not just the horn Shrunk algorithm it's just general optical algorithms the idea is kind of remember how we talked about making a pyramid for the image right so we basically said that I could take my big images and I could make for example a Gaussian or old Ossian pyramid where I made smaller and smaller versions of these images right and so the idea behind a higher goal optical flow method is to say okay I make a pyramid of images and then I first compute optical flow between the two images at the lowest level right because these are really tiny blocky images and at this point the flow vectors are all really going to be you basically want to bring down your pyramid to the level that your flow vectors are going to be on the order of a pixel right now you know again if I start with this explai by 480 image and I bring this down to like a whatever 16 by 12 image well then yes you know I could probably have some pretty good faith that the pixel vectors are not going to be more than one pixel in that tiny tiny image right and so the idea is okay well I complete the flow here right and then I kind of could say okay well to get an estimate of the flow from here I could double all of my magnitudes of those flow vectors and use that kind of as an initial condition for the next optical flow problem part of the part of the trick though is that again I can't do that directly in the sense that even if I was to say okay well now I have all these UVs that are on the order of one at this level they're all on the order of two at that point again the Taylor series expansion for vectors of that size is not valid right so what you typically do is I say okay I have a model for how this thing how the pixels here map to the pixels here and so what I can do is the first thing I do before I estimate the flow at this level is I work all the pixels from here to a new image here it's like say okay push those vectors along their flow or push to push those pixels along their flow directions to make a new image right so the idea is that this image should be pretty similar to this image and then I compute the flow between this guy and this guy so it's like say okay you know when I push the vectors at the highest level I get an image that looks pretty good at the low level but when I upsample it now I find that actually you know I'm going to be a little bit off from that assumption right so now the hope is that at the second level the error between this and this warped image are going to be basically again on the order of one pixel at the next level right so I keep on doing this step of you know estimating flow warping one image to another and the nest obeying flow at the next higher level so at every step I'm always estimating flow vectors that are on the order of a pixel but those kind of composite warps are going to add up to flow vectors that could be you know theoretically much larger than a pixel right so basically that's the general idea that the algorithm that I put here in the book gives you the more specific details yes so you estimate the flow and a skeleton that you right let me try and be a little more concrete about this just a time so basically the algorithm I don't have the kind of a tendency to be hand waving in these lectures so basically that is you know create pyramid for each image and so let's call these pyramids basically I 1 sub I and I 2 sub i where basically I indexes the level of the pyramid okay and so the idea being that I once 1 and I 2 sub 1 are basically the finest level like the original image and I once a 10 and I - sub n are the courses level and these are going to be basically the block against images and so it is that okay so let's suppose that I start with these two guys right so now I'm going to basically estimate so this is going to look something like for J equals you know and you know I'm going to kind of start at the lowest level the course is level I work my way back up to the highest level right so I'm going to estimate the flow between I 1 sub J and I 2 sub J right and this should basically be on the order of a pixel at that level more or less okay so this is going to kind of give me a flow field at level J okay and then what I want to do is I want to figure out at the next highest level what should be the flow field that I start with and so what I'm going to do now is construct the flow field at level J minus one so how do I do that basically that's like saying I would take the this is where the math gets a little bit messy what I do is I'd say something like okay the UV at level J minus one is going to be like two times the UV flow field level J plus four times whatever the flow field was at J plus one all the way to you know to to the whatever I'm going to put a box here times the flow field at the coarsest level right so basically that's like saying that to build up where I should be at the next level I have to kind of concatenate all the flow fields that I had below me and I have to multiply those by the corresponding power of two because these images are all you know proportionately smaller by a power of two okay and then I'm going to work okay I guess I was I drew my picture a little bit wrongly here bye miss miss I'm going to rope I to at level J minus one to some new image using the using this flow field so I guess I drew this picture wrong in the sense that I'm not warping this image I'm warping this image so I'm going to redraw this in a second right so nicely this is saying okay what I'm going to do is I'm going to push the image to intensities over by the estimated flow vectors so that now I don't expect there to be a big difference between image 1 and image 2 except for maybe the other pixel offset the current level like that and then basically I kind of set my working level 2 J minus 1 and then I would kind of iterate to go back here so the kind of thing is I'm basically continuing to keep on estimating flow fields at the fire and fire levels of the image and every time I kind of add on ok so then what should the work look like based on everything that I learned so far and then I'm adding the little finest little increment a little adjustment at the top layer until I work my way all the way back up to the fine level layer right so that's the idea it's only when we kind of regrow my picture I realize I did it wrong so that is that I have the highest level images so these are basically like the N level images and these are like the one level images right so this will be my UV flow at level one that little atom sorry actually I reckon strongly this is one to them right so this is my UV float level M and so that's like saying okay well now what I want to do is I want to push this I want to push this image over by using twice the estimated flow field at the bottom here so that these pixels move over to roughly where they should be right and then I estimate the flow between these two guys and this is what gives me kind of the next flow level and then this guy I'm going to warp over here using both this thing and this thing these both go into estimating this warp right and then I have a the flow between these things right so basically I keep on kind of bootstrapping my way up to the top so estimating between the block used images and then you're using that flow the next and next market your levels target image like Chris second image right right right so you use that that and then here first image TMT to that new work too right yes we you things most amazing interesting so why are we why are you working yet why are we working I mean I guess in my mind it would make more sense if we were using that to I think you're saying so I mean the question is why why would we not work the left-hand image and in theory you can do that too right I mean doesn't really matter which one you work as long as you are consistent about it in some sense right right so I mean you could make a hierarchical hierarchical implementation where you kept on changing image one instead of changing image 2 you just have to keep it straight what level one hmm right so the final vector final flow vector is going to be basically let's say okay suppose I have like four levels of the pyramid so let me let's say okay I'm going to take four times the vector is like saying I take 16 times the courses flow vectors plus eight times the next course is plus 4 times 2 times plus the finest increment right so yeah it's going to be a concatenation of a bunch of flow fields right so yeah I know this little confusing the first time you see other questions comments yeah I mean the main I don't understand how working the it doesn't really matter I mean the main thing is that if you don't do the warping your flow vectors are going to be too far away for this foreign junk method to to get a good estimate basically right I need to make sure that the two images are almost directly comparable right and recently the reason why I'm confused is because if we're going to be estimating the left to right it's not the same thing as estimating the flow from right that's a good that's a good point so leave the left-to-right flow and the right left flow are not always necessarily the same in theory they are opposites of each other right so I mean in theory if I were to look at the optical flow from here to here all my flow vectors would be negative right right you ready theoretically but what causes that to screw up and I'm going to talk about this towards the end of class are things like occlusions and you know other issues so let me talk about that a little bit later right but in theory they're they're reversible but I guess what I'm trying to figure out though is how I can see how if we're calculating the UVs from the left to the right hand side then applying those to warp the next level left-hand image because they're in that direction and oh never mind so okay yes all right so this is what we're doing is just because the way that usually yes okay no well so as I believe ever saw them work here to be working in the opposite direction I think that's what you have yes it's so efficient is which way am i warping so when you warp like so when you when you we reposition this evening that you need yep would you move it in the UV direction or the negative you darken yeah right I see if you're setting which direction you because the way you've drawn here I would actually switch the UV minus one and the word board yeah okay so let's what's the good this this is a good question and now you're making me doubt my own sanity so let's think about this at two levels okay so we have a high level and the low level I wrote thinking about this at the time and being pretty sure I got it right but let me just hopefully it's not going to be embarrassing for me okay so let's close then look what suppose a cure I have like this vector 2 comma 3 and I estimated that it went to 5 comma 8 or something like that okay so my estimated flow vector here is going to be sure this is a bad example because this is too big of a thing so there should be something like you know 2.5 and 3.2 okay so minus tau V is going to be 0.5 and 0.2 for this vector okay for this point all right now I bring things up to here right so this point will become 4 comma 6 and this point will become five comma six point four okay so the idea is that I'm saying that these things are comparable I'm going to have pixels like negative now I'm going to newly estimate this all right something like this so you're saying that I should work this image bass eight I should float it either you work that one back before I go I want this over Brett using the VC I'm not necessarily sure that the I guess the flow can be I'm just basically confused about the positive and negative relationship between you tMI's pixels yeah yeah and the morning that's all yeah I was curious to see if you have worked the original image forward or if you could give the same result by rewarding them the the second image T plus one right using them because you're trying to bring right closer together to try to bring them closer together graduate work the plus one image using UV you would actually work it further away yeah so you're saying that I want to work this too you know I want to bring this back to four point five comma six point two and then have the error be on the order of a pixel as opposed to going this way that definitely makes sense okay so if you do the warp into that image I can understand that right do it in the negative direction or just yeah you know I think you're right and that makes me think that I have a typo here so let me go back and and revisit revisit this but I think that you're right the intuition is correct right I want to make this pixel as similar as possible to this basic right and so that makes sense that I want to reverse the flow or conversely I want to push this pixel forward to four point five to six point two here and then estimate the flow between these things okay yes I think you're right I need to go back and kind of convince myself but I think that you're right yeah I kind of was worried about while I was writing it down actually that I was going the wrong way but yes okay good call all right okay I guess that's the other thing is that I need to multiply this by two right so really I should be working with a double vector so I should be moving something by one comma zero point four right so then this would move to five comma six point four yeah yeah I guess that makes sense right that's like saying I want to push this over to be exactly this point right and then I have some error in that estimate yeah that makes a lot of sense okay so it seems like the right thing to do is to basically either push image one forward or push image two back right in principle they should be the same right but I think that actually in practice it makes more sense to push image one forward because that's really what your model is right your model and things is pushing which one for so be better to push image one forward to this and then estimate the flow here yeah so I think I just basically have image one an image to reversed in my derivation okay the one you had yes well I had the beginning was correct oh that's good that's not good and then I think that is what makes sense to me in the first place so ok all right well I'm glad we had this little talk okay so um all right so just to kind of give you a sense of so what do you get when you finish with an optical flower that might so here's a couple examples of two images in this case you know I move the camera there was no actual object motion but this is kind of the kind of this is the kind of result that you get out of a flow algorithm we're basically at every point I can visualize the UV direction and so in MATLAB for example this would be called a quiver plot where you basically take an arrow and you put it on top of ring point just FYI so so part of the homework asks you to make such plots for a few canned optical algorithms and so in this case I kind of made this picture with the combination of MATLAB and then I went in and probably touched up the arrows a little bit so they were more visible you can do this all on my lab but just a word to the wise that what's come what comes right out of quiver may not be very easy to visualize so you probably want to make your own little algorithm that decides how far apart you want the arrows to be spaced now think you want them to be just so you can see what's going on better but in this case again you can see that from going from image one image to is this kind of clockwise twisting motion and that's captured by the flow field even in cases where there's nothing really to grab onto to estimate right so for example there's a whole big flat background here that I really don't have anything to grab onto feature wise to estimate flow vectors but the fact that the horn Shrunk algorithm has this nice smooth this property forces all the other vectors just kind of follow along with what the rest of the pictures are doing right that's why things are good it's like you still get a good estimate the floor vectors everywhere even if I don't have a lot of image texture here nearby to grab onto okay so there are so that's that's one kind of key idea for optical flow the other key idea is kind of based on feature tracking and so that's called the lucas-kanade a method so one way to think about that is that the kind of the main idea is I could say okay well let's suppose that I want to kind of think about flow as a process of matching local pixels right so one thing about the foreign Chunk method is it wasn't really clear how nearby pixels were related to each other except via the smoothness term another thing that I could do which makes a lot of sense kind of in the context of feature matching would be say okay I'm going to draw a small box around each point and I'm going to try and find the best small box over here right so it's kind of like saying I want to move this whole box of pixels from there to there okay and so what I could do is I could make a energy term that basically looked like this where I'm saying that for every point XY I draw a window or a box around X Y so this could either be a box filter in 2d or it could be like a Gaussian like we were using for sift and stuff like that where I'm basically weighting the pixels unevenly and then what I do is I say within that region I want to look at the pixels actually I should've written this together I want to look at the pixels here and the pixels over here and I want to minimize this energy function right that's like saying that over the window I want this number to be as small as possible right that's like saying kind of again I'm using the brightness econ I'm using the brightness constancy assumption but I'm making kind of the implicit assumption that locally the UV vectors inside that window are all the same for that you know for that local neighborhood okay and so if I were to write out what the implications are for kind of minimizing this kind of function what I would get would be something that would look very familiar from our feature matching part so basically what our get would be something that looks like the sum I'm going to kind of take the window part out of it but I'm going to get something it looks like this the Sun was over the window right so this is kind of like if I'm if I'm making a box window then I'm kind of simplifying this slightly so this thing would multiply the flow vector and then what I would have over here would be a vector that would look like basically the sum over the window of di DX di DT and D I D Y di DT right and so if you look at this this is exactly the Harris matrix that we derived last time for features the question is I'm doing this for all pixels in the image right so that's a good point so this matrix last time came up in the context of finding features right and so what we found out was that for a feature to be good the eigenvalues of this matrix had to be nice and large right if the eigenvalues weren't large that would mean either the patch was flat or I had the aperture problem where there was an edge and I couldn't estimate the motion right and so that's exactly the problem with this method is that the assumption is that there's enough texture inside the block to make this motion vector accurately measurable right and so that's the drawback of this method is saying okay for this to work at every pixel I may need to choose my boxes to be relatively large right and then if I choose my boxes to be relatively large the assumption that the same UV holds all the boxes in the block may be compromised right and so that's exactly the trade-off is that you know this is good to use some local image information but it may not be reliable if your boxes are too small okay and so the nice thing about this is that I could compute the flow vector at a given point somewhat independently of the rest right so horan Strunk has the advantage of coupling all the pixels together one big thing so that's the kind of just that's kind of differences the horn trunk is global and lucas-kanade a is very local right and so the best optical flow algorithms leverage both of these ideas right so what you actually see in practice is that you take a cost function turn looks like this and a cost function term looks like horn shrunk in you and you weight them together and that's how you obtain a good flow field right so what we're going to talk about next basically are some refinements and extensions to the to the basic optical flow felt algorithms are talked so far they kind of combine best practices from all of the stuff that we've that we've said okay so let me talk about kind of refinements and extensions and so I think it's fair to say that if you were to just take horns chunk like I wrote it or lucas-kanade a like I wrote it and expect to get great optical flow fields those will give you good optical flow fields in general for real scenes at all you have to do a lot of extra little work to make sure your optical flow field is good okay so one thing basically so basically I have still two key concepts for any flow algorithm right one has to do with the data term this says how well do the pixels fit the brightness constant sumption the other one has to do with some sort of a smoothness a regularization term that says how similar should nearby flow vectors be right so any any optical flow valve runtime has two things like that so one is you could instead of using a brightness assumption you could have a kind of a general version that says instead of having to have the pixels exactly the same intensity or color instead you could say that I expect that the gradient at this point matches the gradient at that point and so this for example could help with cases where you know if there are some slight illumination changes in the scene between the two pictures you know the main thing I want to get across is that the two that the edges in those regions should basically be the same right so I don't necessarily have to get exactly the colors to match but I need the differences in colors to match right so in some sense it's kind of like what we were doing with the Plus on image editor we're forcing the gradients inside the target region to match up with those of the source right so this you know you can basically either replace the data term with this entirely or you could say okay my data term looks like you know some difference between I could say I've got something that differs in terms of brightness and then I have something that differs in terms of radiant right so I can kind of trade off and say okay I want both of these things to be true and depending on you know how serious I am maybe I would you know make this gamma stronger or less strong right so I mean this is definitely a common thing to want to do or you know you kind of do a combining horn chunk and lucas-kanade a right so I could kind of do is I could say that my cost function term is something like yeah I use the the local lucas-kanade a term for the data part and then I have the you know horn chunk term for smoothness and it's it's kind of like saying I kind of take the best part of each of the two algorithms right because horn junk is good for enforcing generous with this and lucas-kanade a is good for using kind of spatial you know local station stuff right and so this was the seed of an algorithm by Brune at all that turned out to perform really well and kind of similarly this idea was an algorithm by brock's at all so for a while you know these kinds of modifications made much more reliable optical flow algorithms right so if you look in the back of the book where I'm talking to visual effects people you know they use both of these types of algorithms for quite a while to do to do their optical flow another thing that you can do to make things look better is you know so the smoothness assumption is valid almost everywhere right that is in the middle of foreground objects in the middle of background objects you want to enforce that kind of smoothness but you don't want to enforce it across foreground to background okay and so building that you can kind of do to modify the smoothness assumption to say okay you know I don't want to smudge the optical flow across edges that could represent depth discontinuities right and so oftentimes a clue about whether something is you know closer to the camera then another point is that there's difference in color right so oftentimes color edges in the image are kind of proxies for possible depth edges in the scene and so kind of the way to think about is I don't want to have consistency of flow vectors across the edge right that means if I smooth stuff across the edge I may be forcing these two vectors to be the same when they shouldn't be but I can smooth down the edge right so kind of to draw this that's like saying that suppose I have a strong edge in the image you know I don't want to I don't want to have this pixel and this pixel necessarily have the same motion vector but I'm perfectly okay with having this pixel on this pixel have the same motion vector because they're part of the same you know area and so you know basically ideas don't smooth across strong edges you know part of the reason being these could be depth discontinuities right and so here's kind of a picture of a goat so the idea here is what you can do is you can apply what's called an anisotropic diffusion tensor which is a fancy way of saying you can estimate for every point in the image and ellipse the cab tells me what direction the edge is pointing and how strong it is right so for example here along the bars of this cage I've got strong ellipses that kind of are narrow and go up and down along with this cage when I've got a flat region like here basically I get you know a circle which tells me that there's really no strong edges in that area and you can can see another group in other regions of the image basically these ellipses kind of follow along with how edgy the image is at that point right and so idea is that what I would do is within survey if I compute this ellipse for each region this is kind of like saying in what region of the pixel do I want the vectors to be smooth right so here this would be like saying okay along the bars of the of the thing I want the you know I want the values inside this ellipse to be roughly the same but I don't want to go outside the ellipse because those may be different values right so this kind of idea is a simple incorporation of you know not smoothing across edges again suddenly makes your flow fields a lot better because now you can tolerate differences across edges that otherwise with horn chunky were just like smooth right over right so this is a very common thing to do instead of just kind of regular flows another thing that you see and this is something that is a general issue in lots of computers and problems and estimation problems in general is what I would call the issue of making a robust cost function and so what that means is that you know what I'm solving a you know so kind of what I have in in the data term for example is I have the difference between this number and this number and I'm squared right so this is basically saying okay you know when I have a very large error if I square that number I get a huge number right and that can really contribute a lot to my cost right and in this way you find out that points that are really outliers like if I happen to have a really bad UV vector estimate at that point that could really contribute over much to the cost function can throw off the whole estimation problem and prevent the estimates from being good right so kind of a visualize this is that as a function of error my cost function climbs up very quickly so that looks like a parabola that climbs up very steeply and that's not good for resistance to air it's right because if I have a if I have a bad error I couldn't way out here I could have some huge number for my cost function that's going to throw off however good the rest of my points could be this guy is going to pull things away and so you know that's a big problem in general estivation is how do you prevent outliers from corrupting your estimate literally you can do it is by not letting points that have high error just blow away your cost function right so one thing you could do is basically instead of having a cost function that grows quadratically you could have one that grows linearly right that's a little bit better or even better you can have something that kind of looks linear and then kind of tapers off so that you can't grow arbitrarily large right in this way the influence of really bad outliers is mitigated okay and that turns out to be really important for designing good algorithms questions couldn't you just ransack it well ransack is a way of dealing with outliers for certain kinds of problems but so it's not really relevant here because the fundamental assumption about ransack is that all the points are related by the same underlying parameters you try to estimate right whereas here I could have a different UV vector for every pixel right so the key that you were is missing to use ransack here is consistency of the thing that I'm trying to estimate right that's that's the idea I mean rancic is a way of dealing with with these cause outliers in problems where you have kind of a single prayer so for example ransack would be good for the thing I'm going to talk about next which is basically you know supposing that I decomposed the image in two layers and I said okay within each layer model the flow as say a projective transformation like we talked about last time then there's only a few parameters to estimate and then I could tolerate outliers if I said okay I got everyone in this in this foreground region I want to try to estimate these four numbers or eight numbers so I can only look like then I could ransack in within each layer it has to wait the flow vector there right that that kind of idea would work that and so here's kind of a picture of what these different robust cost functions might look like right so generally it's a good idea not to use a square function in either the interim or the smoothest term so what you do is instead of instead of writing it like this you would instead have a term that looks like you know some row that operates on this difference right so so this row is going to be some one of these robust cost functions that isn't just the square of something it kind of mitigates big herbs right so that robustness turns out to be very important okay so something else that was mentioned earlier was in theory right isn't the flow field reversible right so I mean if I have the flow from image one to image two shouldn't the if I were to compute the flow backwards for me to to image one should negative basically get the negatives of those full vectors well the problem is that doesn't always work very well and so one big reason why that doesn't work is what are called occlusions so occlusions come from a situation like this where you have pixels that are visible in one image but not in the other image right supposing that I have to set up here in this case in image 1 I can see the corner of the painting image 2 moves occluded by the head of the sky on the other hand you know in image 1 the guy is standing in front of the corner in the window but here the window is revealed as you see a different camera perspective right so the problem is that there's no position in image to where pixel a will have a good match right you just don't know what that number is and the same way there's no good match for pixel B over back and image 1 right so if you force yourself in your flow algorithm to have to match every pixel in one image to the other image you're going to have a bad time because you're going to have occlusions right and so depending on the situation things may not be so bad like so if I have a pair of images from a video camera that are split apart by a thirtieth of a second then how bad good things get probably not super bad right but if I have two images for example from a stereo camera where the true cameras are a meter apart looking at this image at the same time well then maybe I could have some serious occlusions where there's like these rinds around the foreground optics we were talking about for matting and in painting right so when I see behind an object you know I'd figure out well I don't have the correspondence to put those pixels there right so I'm going to talk more I think about occlusions when I talk about stereo correspondence but at the very least you could use it out to the flow algorithm that tries to check for occlusions and mitigate them right and so the easiest way to think about that is if I have a candidate optical flow field so sound looking at this white pixel right and I estimate that the motion vector is pointing towards this black pixel over here what I could do is okay now I can compute the flow from image to image one and I can check to see well where is the flow vector for this black pixel pointing is it pointing back to the position of the white pixel image one or is it pointing somewhere else right so if life is good and things are symmetric then the white pixel in image one should be the same as the gray pixel in image one that's implied by it going forward and back along the estimated flow if it's not then I should throw out one of these flow vectors and not even try to consider the flow right so then you know if you look at this picture you can see that you know there are kind of two kinds of occlusions one is when I've only got the pixel visible image one and one is one of only got pixel visible image two and so you can distinguish what case you're in by looking at how good the data term is for each of these vectors right so for example if I've got an image point that is visible and image Hamas is so basically I can examine how good the data term is and see you know whether I'm included in image one or coded image two if either of these vectors points in the wrong direction this is kind of called sometimes call called cross checking or a left-right checking if I find that I'm in that case I can kind of throw that flow vector out so I could do cross check or left right checking to detect occluded points and then you know the most I'm going to say about that for the moment is just not to use those points right we're going to say more in stereo about how that works but basically you know there's not that much you can do when you've got occlusions right there just isn't a a point that you could assign so that you would be able to reliably estimate um another modification is called layered flow and so the idea is pretty straightforward right so you imagine this again I talked about the cushions moving from the background but you could kind of say is okay so if I've got you know this person moving in from the background so the person and the background are going to appear to apparently move at different rates in terms of measured on the image plane right so stuff that's closer to the camera will generally move faster than stuff that's further away from the camera right and so what I could do is I could say okay I'm going to try to segment out this foreground then I'm going to call this background layer 1 and this program layer 2 and then within each layer I'm going to basically I could maybe get away with estimating a parametric transformation right so for example I could say okay I'm going to estimate a projective transformation for layer 1 and I'm gonna say a projective transformation for later - right so it's not one good transformation that applies to the whole image but if I am able to split the image apart into good boundary so some since this is kind of like the matting or the foreground segmentation problem right so if I can accurately segment the foreground pieces then I could maybe say okay I'm going to assign each of you a relatively simple motion and then I can compute the whole optical flow field by these motions you know back together in the appropriate ordering right or if I don't want to be really precise about forcing all these guys to have the same consistent motion vectors I could have something where I have what's called like smoothness in layers where I say again instead of just doing the anisotropic thing where I where I compute what direction I want to smooth every pixel I could say okay I want everything within a given layer to be smooth right and I don't have any sort of smoothness assumption across layers so sometimes you see this thing called smoothness in layers and so you know that's another possibility for again there's a little bit of an extra step here that may not be really worth it because you know estimating the layers is already a kind of tricking thing as we saw from matting right it's not so easy necessarily to really accurately draw a tight outline around the program there so and as you saw in Jaipur there's lots of places were for different kinds of objects in the foreground there is no nice hard segmentation in the first place that would be a good layer so when you see this applied to real images you often see it in the context of things like cars and trees and stuff like that where you can draw a nice hard boundary around the layers couple things couple final thoughts you know another idea that you may see around is kind of what I would call large displacement flow right and so the assumption that we made with this hierarchical method was that you know at every layer I'm sorry at every level of the pyramid my floor was relatively small right and that you know for example suppose that I'm you know like waving like this you know my hand is small the motion of my hand is kind of small relative to the size of the thing that's moving right but imagine now that I'm playing volleyball right so now I'm moving my hand really fast right so if I were to look at the pyramid images of fast motion of a small object so for example let's suppose that I'm you know playing volleyball right and so here here's the ball and one in one image and here's the ball over here well if I were to take this down to the smallest level the pyramid this ball is so small that I probably wouldn't even see it in these two images right because it's maybe only a few pixels wide right so the problem occurs when I have you know I'm mixing up my things well I'll say is a few pixels here but this vector is many pixels because I know that to estimate motions that are many many pixels long I have to have something along the lines of his hierarchical method where I build that log motion vector out of a bunch of smaller motion vectors that I saw at lower levels of the pyramid for something like this that object is not even going to come visible until I kind of get to almost the finest level of the pyramid and by that time it's too late I don't have any big motion vectors that I can add to that point right and so that's a problem for really like fast-moving video for example and so what idea that you that you may see is called sift flow which is kind of a you know clever idea basically saying that I can replace the brightness constancy assumption by saying that I want the sift descriptor at a pixel to match with the sift descriptor or somewhere else right so it's not like a generic data term but I also those those kinds of things are not really well-suited to the kinds of optical flow problems you counter in visual effects and usually you're trying to estimate flow between things that are basically you know reasonably well spaced apart images from video not from images that are like from different perspectives so a final thing is that you know again there's so much of a effort in computer vision to make things fully automatic but there's really no reason why that has to be true so there's a nice algorithm called basically human assisted motion annotation and this is by luminol and so the idea behind that is the following so basically you have a video sequence for example where I want to estimate the flow and so the right-hand algorithm may be the kind of result that you would get from applying a generic optical flow algorithm where the colors here kind of represent you know the strength and the zig imaginethe color basically represents for example the orientation of the flow vector and the depth of the color represents how strong it is it's hard to represent the flow field as a color map but the point here is that you know if you run a fully automatic algorithm you may get a flow field that you don't like whereas you know that for example in this scene I expect most of the pixels on this car to have one flow vector and these guys and these guys and the background should be different so what this algorithm asks you to do is to provide a kind of a course user assisted annotation of one of the foreground objects and what is the ordering of the foreground objects and that does this kind of smooth this in layers idea to find the best flow files it can given the user constraints and so you know in the real world that makes a lot of sense right like why do you care yourself to have an automatic algorithm that gives you some weird result when it wouldn't take you that much effort to make a quick segmentation like this where you kind of draw some boundaries around some rough objects and say hey algorithm you know this is where I kind of expect the chunks of flow to be then the algorithm goes off and does the best it can with your prior information right so what I want you to do on the homework is basically like this where I'm asking you to take three canned to flow algorithms off the shelf right one is this hornish chunk algorithm one is this algorithm I guess I should say here so there's also this algorithm from brown secrets of optical flow so basically you know if you read this paper you don't have to be a paper but you know this kind of talks about you know what the best practices for all these little modifications that I've been telling you about over the past 15 minutes to make a floor algorithm that really perform as well and they have a nice free implementation of what they consider to be kind of a best practices flow algorithm and they also have a reference implementation of horn shot and so I want you to do is say okay take the reference emulation of horn chunk on a couple images that you took yourself how well does it do now take this you know nice computer vision optical algorithm how well does that do and now try doing it with this motion annotation where you give the algorithm you know you you kind of show the algorithm hey these are the layers I think that you should have and how old is app - right and so that's that what I want you to do over the homework is to kind of fool around with some of these flow algorithms to see how well they look and so again how do you tell how well it looks well you can do some you know cooking exactly well what was the flow vector for this point image 1 does it point to the right place an image - right you can visualize the quicker plots of how the flow looks it's going to get a sense from the river plot of really how good the flow is which is why I was suggesting you kind of kind of cherry pick some points in the image and see what those vectors look like and do they point to the right places and just like with matting you know there's this big effort for benchmarking optical flow dated optical vlogger all right so now if you have a new optical algorithm there are some you know common benchmark data sets that have known optical flow fields that have been very carefully ground truth and then you submit your algorithm to you know to this thing and there's this kind of constantly ranked list of what are the best flow algorithms and so you're seeing here kind of like in pixels you know what is like the error in the flow vectors between what the algorithm spits out and what the ground truth is right so you know these days there's like a zillion optical flow algorithms you can see that this table has like many many entries in it and then there are kind of references to kind of a note about what is the way the algorithm works and whereas a reference to where you can find that paper right and so you know even here you can see there are a bunch of 2014 2013 papers that are still being evaluated on a flow right so hopefully when you write a paper you want your algorithm to kind of come out on top of this thing otherwise people are going to say well why don't I just use this other dog on the right so so optical flow Research is a pretty tough field to break into because there's been so much working right last thing I want to say is so why do people use optical flow and visual effects okay so now that we know what optical flow is a very common thing to want to use it for is what I would call retiming of scene right so for example optical flow in visual effects you know one key thing is in retiming right so suppose that I have a camera that has a constant framerate so say this is for example a you know 60 frames per second camera so I'm using this to record some faster than normal action right and now what I want to do is I want to slow this down to say 45 frames per second and so how would I so suppose my first image is the same okay so if I want if I want to get 45 frames per second then that's like saying that my next image has to be kind of somewhere between these two images right so kind of I want to get an image that is here and maybe the images here and so how do i synthesize this image I wouldn't just take this image or that image what I would do is I would say okay I'm going to take the image I get here and the image I get here and I'm going to combine them to form this new image and how would I combine them I would basically say okay one thing I could do is I could step so say I want to make a halfway image I could step halfway along the optical flow and produce the image that I would get or even better would be to say okay I step halfway along the forward flow and halfway along the backward flow and I average those foods those two things to get that image right for example so in that way what I'm doing is I'm kind of not stepping all the way along the full motion vector I'm stepping only part of the way and putting the pixels down and making a new image that looks better right and so this kind of retiming happens very very often in visual effects production where you want to take some some shot footage and make it appear to be a slightly different rate right so you can add that on on the interface side someone just basically has a knob that they're say okay let's slow this down by 20 percent that's really what's going on with the hood is that you're fooling around the optical flow vectors to synthesize these new images right um another kind of thing that maybe I don't need read here though that happens a lot is warping texture right so for example suppose we want to an in painting problem where I have a guy that is including the background and in the next frame he moves over here and suppose I want to remove him from this background well what I would do is I would say hey if I knew well this is a little bit of a combination of things not just optical flow but so if I knew kind of where the rest of the background was moving and I observed this patch of pictures over here I could flow it back to over to here if I kind of knew what the floor was locally right so I can say I can estimate whether what the pixel flow is around where I want to put the texture back in and then I can kind of push those flow backers flow vectors back over into the image to put down the texture in the right place right so this kind of you know moving pixels around based on optical flow is also a very common thing to want to do okay so those are the two main reasons that you would want to do flow and so it's actually a very useful thing to know where all these pixels the image are moving right okay so that's like again the very high-level overview of about to go flow to really get into the mud of you know how all these refinements and stuff work a good place to start is with the secrets of optical flow paper that's a nice paper to read to learn number one what is the basic idea and number two what are all the little things that you should do to make your algorithm a little bit better okay so many questions or comments this rewarding texture idea is to imply that you can kind of use like October floaty and the filling light right so I mean optical flow is one way to attack the invading problem yeah right so if you have lots of information from future frames you could use optical flow to push stuff back into the holes right yes so that's definitely true okay

Info

Channel: Rich Radke

Views: 67,908

Rating: 4.8904109 out of 5

Keywords: visual effects, computer vision, rich radke, radke, cvfx, optical flow, horn schunck, lucas kanade

Id: KoMTYnlNNnc

Channel Id: undefined

Length: 78min 11sec (4691 seconds)

Published: Fri Mar 07 2014