Augmenting Reality #1 - Optical Flow (C++)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to part one of an infrequent series about augmenting reality in this series I'm going to be learning about in exploring the algorithms that operate behind the scenes and enable augmented reality applications and I think a good place to start will be with one of the more enjoyable algorithms optic flow before we get stuck into the algorithm I thought it might be worth having a look at the setup and what I've got here is my webcam at a low resolution displaying a grayscale image at the command prompt and I've done a whole video about that already I'm not going to go into the details of the webcam application just here and we can see I've currently got a friend I've got a red rectangle stuck to my head and the idea of optic flow is I can create a velocity map that allows me to interact with things on the scenery so you can see I can manipulate the rectangle with my hands I'm doing it rather gingerly and slowly because if I hit the rectangle it gets more velocity so there's some sort of physics involved with how the rectangle behaves and if the rectangle goes off the screen I've made it wrap around to the bottom and we can see now rather unfortunately it's got stuck in my microphone so if I move my hands around you can see the microphones in the way and so I need to just go in front of the microphone and lift up the object at least flick it out of the way come on there we go get out of the way so this is a crude form of augmented reality but the algorithm behind it is quite nice and it also covers some of the fundamentals of image processing and what I'm trying to demonstrate is that for the most part the object behaves according to how I'm interacting with it I can push it one way or the other let's try and find them again one last time there we go so I'll bring the the rectangle to the middle of the screen and just try and leave it there maybe I can push it to the left and I can push it back to the right a little bit I'll try and grab hold of it from above and bring it back down where am I going there we go perfect alright that's enough let's have a look how it works let's start by considering how motion is determined between two successive images so here I've got the two pictures on the Left which are frames taken from a camera and T minus 0 here is the most recent frame and T minus 1 is the frame before that of the previous frame and we can see that the little chappy inside the frame has waved his arm it's moved a little bit and if we subtract the two images what we actually get is not the whole body but we just get the two locations that have changed so we'll get the original arm plus the new arm and depending on how the image is encoded and the type of the movement and the luminance on the screen one of these will be positive and one of them will be negative and if we take the absolute values of all of the pixel values ie set them all to positive we can work out an area where motion has occurred but all this map will tell us is that motion has happened and whereabouts in the map has the motion happened we don't know what direction and we don't know how much because in the world of image processing our motion is represented as just being the difference between two successive frames in time before we start working out the motion vector for individual pixels let's consider how we would do it for the whole image so I want to imagine a scenario where the camera is moving but the scene that the camera is viewing remains mostly static in this case I've got two successive frames shown by the black and the blue image and our brains can work out that actually the even though they're similar one of the frames has moved slightly in fact the camera must have moved in this direction because we can see that the little man has moved towards the left of the frame if we overlay the frames precisely we can see this more clearly and the result we're interested in is how much movement has occurred and there's really only one way to test this now because this is the first video in this series I'm trying to keep things conceptually simple enough so I'm not going to be looking into the more advanced routines and feature extraction routines which would typically be used to solve this problem instead we're going to brute force it and so to calculate the motion vector the only thing we can do is to physically test the new frame in a variety of different positions to see where does it line up with the old frame and so we would just need to algorithmically overlay the two and see where they have the closest match ie where is the difference between them the minimum and of course it's like this and once you've found the closest match we can observe the nature of the vector that we need to represent that motion and don't forget this is for the whole image moving this is as if the camera is moving and because it's the whole image we can assume that the vector has its origin at the middle of that image and we only have the one vector for the whole image and this is our global optic flow and your computer mouse operates on a similar principle to this it rapidly takes images in succession and tries to work out how the image is transformed in the 2d plane from one location to another once it's done this that vector is then translated into coordinates that are sent to the computer to move the mouse cursor around now the type of optic flow we're interested in isn't global optic flow as such its local optic flow we want to work it out per pixel so every pixel in our image gets associated with a vector that suggests how that pixel might have moved in the past and so things that remain static don't really have a vector at all so the body here didn't change in fact the only thing that did change was the arm here again I've got our two successive frames and if I overlay them we can see the difference is really just the arm changing position for a given pixel I'm going to try and work out where has that pixel come from and to do this I create a patch that surrounds that pixel and this patch will be a small fragment of the image it will have some statistical features that make it different to other parts of the image I'm going to assume that's the case for now of course it might not be and that could lead to errors in this algorithm once I've got a small patch and then going to test that patch against lots of different locations within a given search area and we need to restrict the search area for two reasons one is if we search the whole image it would be tremendously slow it's not a very computationally efficient algorithm but secondly we can assume that the differences between frames on the whole are minimal particularly if it's a human being I can't move my arm so fast that it makes a massive difference at say 30 frames per second for each patch in the search area I record the similarity with the base patch and in this instance it would be somewhere like here they're not exactly the same but they represent a similar part a similar feature of the arm and therefore for this patch to get from the location in green to the location in red it had to move with this vector once we found the best matching patch for every pixel in the image we can then modulate our vector field map with the original motion image because a lot of the vectors were not interested in there hasn't been any motion and so they'll just be the results of noise and other unwanted estimations and byproducts of the searching algorithm so we restrict then to just the pixels that were interested in and all of the other vectors effectively get set to zero now let's start with the coding but I want to emphasize that I'm not making any effort at all to try and optimize this algorithm in fact it's going to be very inefficient and run quite poorly indeed and the reason for this is I want to keep it clear I want to keep the step by step nature of the algorithm very visible for viewers to study how does the algorithm work but I do intend to follow up this video with another video that specifically talks about optimizing this algorithm using a technique called integral images but for this video we're going to keep it simple and brute force and we're going to pay the price for that simplicity as usual I'm using the OLC console game engine and if you've seen the webcam video we're going to use a library called skp to handle the webcam for us and i've derived a class from the console game engine called AR optic flow because the performance is going to suffer in this algorithm I'm using quite a low resolution console eighty by sixty characters where each character is 16 by 16 pixels and I've taken the liberty of already entering the webcam capture code we don't need to see it again and this involved a union and some variables to handle and represent the camera some code to initialize the camera to a given size and in this case that's going to be the size of our console using the screen width and screen height functions and in the on user update function I've created a lambda function to simply draw an image to the screen in this case an image is going to be a two-dimensional floating point array it just iterates through the array and chooses the appropriate combination of characters and colors that represent that pixel the first thing the unused update function is going to do is capture the image and convert it into normalized grayscale values which we'll use for the rest of the algorithm and the frame returned by the camera after processing for luminance is going to be stored in a 2d array called F you camera so let's add that now and for the arrays I'm just creating a pointer to a floating point variable and in the on user create function I'm going to allocate the memory depending on the size of the console and I'm going to initialize it to 0 just to make sure everything so far is in order I'm going to call the lambda function draw image and pass to its the 2d array that represents our new camera and we should just be able to see the raw image on the screen let's take a look and there we go it's a bit low resolution it's a bit fuzzy just to prove that this really does work I'm going to quadruple the resolution into something which is a bit more standard webcam and now hello you can see me quite clearly let's start by just calculating motion within the camera scene I need to record the previous frame so I'm going to create another array called old camera and our motion is going to be the difference between these two arrays I'm not going to show this each time but for each array we're going to allocate it and set it all to 0 we can see here that for each on user update we completely overwrite the F new camera array with the new Luminous values so before we overwrite the new one we should store it and we'll store it into the old one now this array operates on a pixel-by-pixel basis it's going through every single pixel in the image giving us a an X and a y-coordinate so this allows me to create a new variable called f diff which represents the absolute difference between two corresponding pixels for the two arrays so the pixels are in the same place but the arrays are different and I'm going to create a lambda function called get pixel to do the sampling into these arrays I'll put this lambda function with my other one and the reason for doing this is I can check that the coordinates of the array are in bounds and I can also specify it to return a zero which will find to be quite useful later on if it's not within the bounds once I've calculated the difference per pixel I'm going to fill a third array motion image with those differences but I'm going to ignore all the differences that are so small that they don't contain any useful information one of the things about working with webcams and image processing is you're working with real-world data and it's rubbish the real world will throw absolutely everything you that it can to interfere with your progress so you'll find a lot of image processing algorithms we do have to threshold and tweak values to make things work and so to account for some of the just simple pixel noise that's going to occur between frames because it's never going to be the same value of pixel twice really I'm going to set it to zero if it's below that amount I'm curious to see our motion image and because of the draw image lambda function we can easily do that at any time let's take a look so in a still scene there's not much at all if I move we can see we've got the difference between two successive frames and it's quite sensitive so I'm just ever so slightly moving my microphone around and we can see we're getting basically an edge detect but we're not interested in edges for this video but even though I'm trying to be as still as I possibly can we can see there are lots of spurious elements firing each time and that's just through little vibrations of the camera mount camera noise that's being picked up and not being filtered out and also an estimation of how much motion really qualifies as motion so I'm trying to keep my head as still as possible but just because I'm talking means it's moving around a little bit and I don't really think that this should interact with the world so I want to now remove small movements and by small movements I mean rapid and large changes between frames these are not likely to be natural to remove the temporal high frequency components of the image I'm going to add a low-pass filter low-pass filtered image is going to be yet another 2d array I've created that further up in the code into this image I accumulate a small amount of the difference between the filtered image and the new image so here we can see I've got the new image I'm subtracting the filtered image and I'm taking 10% of it and accumulating it into the filtered image and of course with my draw image lambda function I can see what that looks like - let's take a look so we can see the image sort of faded into existence and it's become very stable in fact I'm talking and you can't see my mouth move and if I do move I ghost if I move my arm up and keep it still it takes some time for the arm to appear and so we've effectively removed all of their little high-frequency jitters and it's quite ghostly and if I move very rapidly waving my arms around you see it never really settled so it's ignoring this high-frequency motion and of course this 10% is quite tunable so if I set it to say 80% instead what we're allowing is actually far faster motion so you see there's still a little bit of ghosting but not very much at all my mouth still doesn't look like it's moving as much as it normally should if I open it you say it takes some time so my mouth is moving at such a speed that that's being filtered out by the algorithm but because we're taking a larger chunk now we're allowing more high speed components through we can see that the boundary of the lights here in the background is quite flickering jittery and of course this will look like motion to our algorithm because it is a change of pixel values between successive frames now that we've got our filtered image we need to update our motion calculation to use the filtered images because currently they're using the new camera and old camera which are not filtered at all so I'm going to change this line to use filtered variants instead and you can see I'm going to need to create an old filtered camera variable to accommodate this and I'm going to update the old filtered camera array in the same place that I'm updating the simple old camera array this means now that my motion image is really robust and only contains significant motion since we've acquired an image the next phase is to calculate the optic flow vector map I'm going to shift the draw image into its own section now which is update screen as already discussed for this video we're just going to do things in a brute-force way we're not thinking about efficiency just yet and so I'm creating two variables both integers one is patch size which is going to be the square dimension of the patch that we're looking for so in this case a feature could be a nine by nine collection of pixels and I'm going to specify the search size which is a 7x7 grid of pixels through which to search with the patch let's just have a quick look at some numbers to see why this approach isn't particularly efficient well for 320 two-forty array of pixels that's seventy six thousand eight hundred pixels we're going to be performing a search over a 7x7 grid so that's 49 possible locations for each pixel to exist in which means we've got three million seven hundred and sixty-three thousand and two hundred search locations to do and for each search location we've got a patch of image to test our patch is going to be nine by nine pixels so for each of our three point seven million searches we've got 81 comparisons to make which means we've got three hundred and four million eight hundred and nineteen thousand two hundred comparisons between pixels to perform and let's assume we want to run that at 30 frames per second well that's an easy total of nine billion one hundred and forty four million five hundred and seventy six thousand individual pixel comparisons to make to run at a real-time rate and when you factor in the operating system and the fact that it's all random access to memory and the pollution of you cache and lots of other problems we can see that this algorithm isn't particularly efficient at all and so to help us out I'm going to significantly reduce the resolution of the algorithm to get it to run at least in double digits frame rates on my fairly high end machine but don't forget there's gonna be a follow-up video which talks about solving this problem so let's now go back to our original resolution it doesn't mean we can make the pixels bigger we're going to have quite a few nested for-loops here so it's going to be a little tricky for me to keep everything on the screen without zooming out we know to begin with we're going to go through every single pixel so let's start there let's create two for loops that iterate across the screen and down the screen I'm going to initialize the variables for this pixel and for each pixel we're going to store a variable called patch difference max which is set to a very large number which is going to keep track of which one of our search patches is the closest fit to our base patch I'm also going to initialize to zero two more two-dimensional vectors flow X and flow Y which are the X and y components of our overall motion for that pixel now I need the next stage of my four loops for each pixel we need to go over a search area in this case the search area is just going to be a rectangle surrounding the original pixel so I'm going to create a vector that represents the vector from the original pixel to the center of the search area and when to create another variable F accumulated difference which is going to be the total difference for a single patch at this search location and it will be this accumulated difference which is compared to the max difference later on to see which patch is the closest fit to the base patch now we need the next set of four loops for the patch since the search vector represents the middle of the patch that's being searched I can create the actual pixel coordinates as an offset of that search vector assuming that the search vector is in the middle of the patch so it's very similar to what we had before and in a similar way I can work out the indices required for the base patch which is basically just the patch size around the pixel that we're testing now that I've worked out the indices for the pixels in both the searched patch location and the base patch location I can extract the luminance values of the two successive frames at those locations and then update my accumulated difference make sure to use the absolute function to accumulate the difference because there will be some negative values in this and they will subtract from the difference so for each pixel and for each search location and for each individual patch pixel we accumulate a difference and this means we can compare patches and it's important to compare patches because I want the patch which is the closest match to the base patch and so if my accumulated difference is lower than the current Max accumulated difference ie there is the least difference then I want to store in my 2d flow fields the search vector that we created earlier and as you can see we frequently use this sort of notation which always implies per pixel Y times screen width plus X we've now finished the brute force algorithm and every pixel in the flow field has been allocated a vector of motion it's time to modulate this flow filled with the motion image and so if the motion image contains motion which remember has been quite severely filtered earlier on then we know this motion must be significant else if there is no motion then we set the flow field vectors to zero to indicate as such now the reason we need to do this modulation is that the patch search routine has to come up with a value it has to choose a vector even if there has been no motion hopefully that value indicates zero zero but it might not and we want to remove these spurious motion vectors from contaminating our vector flow field now we effectively have a 2d map of velocity per pixel we can have some fun with this by using those vectors as part of a physics calculation to control an object ie the Red Square you saw me manipulating at the start of the video so I've created two sets of vectors obviously they're pairs because it's an x and y component one which represents the position and one which represents the velocity and in on user create I'm going to initialize the ball's position to be in the middle of the frame our number of my videos this year have centered on doing some rudimentary physics for games and this one is no different in fact you can look at the flappy bird video he asked that one to have a look at what we're about to discuss but very simply we're going to update the balls velocity based on its position within the velocity field and so here you can see the individual components being updated we use the balls location in the optic flow field to find the component of the vector with which to augment the velocity so in this case in our X component field I use the ball's current position to find the correct index into that field which will give me the X component of the optic flow vector and I multiply that by a constant that represents some sort of speed this will need to be tuned depending on your application and I also include F elapsed time just to keep things running smoothly as the algorithm can be quite fluctuating I do the same for the Y component the balls position is simply augmented by the velocity with respect to every lapse time I've included the value one here just in case you needed a tuning parameter to give the ball some weight and I've added a drag effect to it and I'm just simply reducing the velocity by fixed amount over time this means the balls velocity will tend toward zero when it's not being governed by the optic flow map and this will stop it just drifting continuously in one direction and finally when the ball does reach the edge of the screen I'm going to wrap it around to the other side this means we've got nothing left to do other than through all the screen and we'll use the realistic camera image not the filtered one because that's what the user might expect to see and I'm going to draw the ball and of course it's not really a ball it's a rectangle I'm going to use the fill command provided by the game engine to do such let's take a look and so here we can see the red rectangle in the middle of the screen and I can push it up there and I'll try and grab hold of it and bring it back down and if I try I've got to get my hands in the right order push it over towards my other hand and bring it back and push it again then you can see I've actually got quite reasonably accurate control over it I might not have reasonably accurately control over my own arms but at least I can move the rectangle around some way on the screen let's try and bring you back down here now it's quite interesting this because it's got stuck on the corner of the screen and we'll discuss that in a minute once I finish playing with this case actually quite addictive to do my other hand and if I try and thump it the higher velocity sends it on its way easy so why was it clinging to the edge of the screen well in the comparison phase of the algorithm where we're comparing patches we call this get pixel lambda function and four pixels close to the edge of the array these values will lie beyond the edges of the array and we've told our lambda function to return zero in this case so we get an unfair comparison and we'll be looking at handling this edge effect and introducing some serious optimization through the use of integral images in the follow up video in this series we've enjoyed this video giving a big thumbs up I know it's been a bit more technical than some of my other videos but I think it's an interesting topic nonetheless have a think about subscribing I see the numbers are still increasing so I'm really pleased for that thank you to all of the new subscribe and I'll see you next time take care
Info
Channel: javidx9
Views: 22,365
Rating: undefined out of 5
Keywords: one lone coder, onelonecoder, learning, programming, tutorial, c++, beginner, olcconsolegameengine, command prompt, ascii, game, game engine, augmented reality, optic flow, image processing, webcam, video processing, feature extraction, motion, optical flow, motion detection, vector fields, augmenting reality, AR
Id: aNtzgoEGC1Y
Channel Id: undefined
Length: 24min 33sec (1473 seconds)
Published: Wed Nov 15 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.