Computer Vision with MATLAB for Object Detection and Tracking

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to the webinar for computer vision with MATLAB my name is Bruce Tannenbaum and I am a Technical Marketing Manager here math works for image processing and computer vision applications over the years we've seen our customers do some very exciting and innovative projects with our tools NASA created very successful Mars rovers BMW created a parking assistance system digital persona created fingerprint recognition algorithms and a team of students from Eindhoven won the RoboCup championship for robot soccer using MathWorks tools while these applications had many different requirements they can all be classified as computer vision applications computer vision can be defined as the use of images or video to understand a real-world scene while it relies on a foundation of image processing it adds algorithms to detect identify classify recognize and track objects or events to understand what this means let's say that we're building a system to monitor a busy city Road intersection as in the image on the lower left part of the slide after we capture data from a camera first we want to remove some noise adjust some contrast or perform some other form of pre-processing then we all want to identify relevant objects in the scene like the pedestrian on the left or the red car in the center then we might want to recognize events like if that red car just went to a red light we might want to issue a traffic citation to the number on the license plate identifying these objects in any events is the job of the computer vision system computer vision system toolbox is our key product for computer vision with MATLAB it contains algorithms for feature detection extraction and matching feature based registration using ransac object detection and tracking stereo vision and video processing while the basic capabilities for reading and writing video files are in MATLAB this toolbox adds visualization capabilities including functions to annotate video frames that I'll show later while there are a lot of interesting capabilities in the computer vision system toolbox we are going to focus on object detection and tracking in this way and the past few releases we have shipped a number of interesting capabilities that enable you to do more than what was possible with just image processing I'll be showing you face detection people detection and several new tracking capabilities including Kalman filtering you could say that we've supported object detection for many years with their tools and image processing two primary techniques come to mind these are blob analysis and template matching with blob analysis we use segmentation to find objects and then we measure their properties this works well in situations where we can assume that the objects we find by segmentation are the ones that we want but it doesn't work well in more complicated images where segmentation is difficult or there are many types of objects with template matching we take a small template image that represents the object we're looking for and search the entire image for a match we typically do this with normalized cross-correlation template matching is commonly used in machine vision where we can control the camera and object locations as well as lighting conditions but it is not robust to rotation occlusion or changes in object size fortunately for us the world of computer vision research has made great strides in its ability to detect objects computer vision approaches to object detection are often referred to as feature based a feature is a key point or a descriptor of a region that can be found reliably in many images a fairly basic feature to understand is a corner like this image second one from the left a corner consists of a substantial gradient change in two dimensions just like the edge of a car here right or the corner of the license plate on the back these features are important to many computer vision workflows one of which has just happens to be object detection and tracking today I'm going to show you three different examples of feature based object detection with computer vision system toolbox in MATLAB the first approach is feature matching which you'll see is somewhat similar to template matching but more robust in many ways so let's take a look at feature matching in this demo we want to find the queen of diamonds which we have in a separate file as shown here on the right we're going to use serf features to represent the card and then look for those features in the larger image down below this approach is more robust to template matching because we can cover up part of the card and still detect enough features to find in the larger image the features are also invariant to scale and rotation so we could move the card around and still detect it template matching lacks these robustness to occlusion scale and rotation let's now switch to MATLAB to see how we can detect the queen of diamonds so now we're going to load the small image of the queen of diamonds using I am Reed we're going to convert it to grayscale using RGB to gray function and image processing toolbox and then here's our first function from the computer vision system toolbox called the text surf features and this will detect the surf features which we'll put into this variable here we'll extract them and then show the strongest 51's on top of the image and what you'll see here is that the features are all over the place in the image and there are generally places where there's a lot of contrast okay so now let's take a look at the individual features the strongest 25 and see what they look like okay and you can see the queue was one feature here the upper part of the queue may be a larger part of the queue here there's a whole bunch of features the corner here around the hair of the Queen all sorts of different points spots in the image were recognized as features that were worth grabbing and with surf basically what it's looking for our blobs and particularly blobs with some contrast on the edges of them okay so that's why it's picked up these features and so these are what surf features look like so now let's load that larger image where I am holding the queen of diamonds card in it okay we're going to detect the surf features on that image as well and visualize that so you can see that it found a whole bunch of other features in the image around my glasses around the light switch and you see here on the art itself and now what we want to do is to use these features to find the features that match the features that we found on the card itself and where those match up should be mostly in this area so we can use a function to match them we use a function called match features so we're going to match them from the reference card this is the queen of diamonds versus the image the feature the images I just found we get these index payers out and then we can show the matched features and we draw lines between them and you'll see in the visualization here okay so it found a whole bunch of features these are the ones that matched you can see that they're all on the card here and they all match the queen of diamonds this is the reference image okay so you could say we're done but really we're not because we don't exactly know where all the features are and sometimes we get an outlier right so maybe something might match from the tip of my hair into maybe the Queen's hair here okay and so what we need to do then is to make sure that we've really found the region that's represented by these points in the card and the way we can do that is to estimate the geometric transformation that shows where those features from the Queen are and where they match up within that larger image and so we're going to use something called ransack and what ransack does is it goes through the features that we found and builds a geometric model to locate that region and then can discard any outliers okay so we're going to apply that and what you can see now is we found all the features that match okay in the image and now what I can do is now that I have a geometric transformation here that matches that's this matrix here this is geometric transformation matrix that matches the region I can then use that and take the corners that represent the you know the width and the height of that image and project it onto the new image and now I can see exactly where the region of the card is and so this is an effect doing object detection by locating this flat rectangular card in the image by using and recognizing the group of features and figuring out which are outliers versus which are n liars by using both matching and then ransacked to estimate where the object is located and with the geometric transformation is let's go back to our slides so let's talk a little bit more about ransack and how it works okay so what we're doing is we're iteratively estimating the parameters of a mathematical model so let's take a look at the pictures down below here so imagine this is our data and what we're trying to do is to fit a line to this data if we actually fit a line to all of the data equally like we would do in a least-squares based approach we'd probably have a roughly horizontal line here because all of these outliers up here and down here would pull the line more flat than what we're really thinking these are really the important data but what we can do with ransack instead is to it early take a subset of points you know we'll take two points each for a line here for geometric transformation we need seven points and we will then take two points we will then create a model so we'll take the two points and that'll give us a line and then we'll see how many points of the rest of the points fit that model okay and then we'll do this over and over and over again in the model I just ran we run at 2000 times in the code I just ran okay and so then we find the one that fits the most the highest number of points and that becomes the model that we then output with rancic and so ransacked is very useful you can use it to estimate a geometric transformation and you can also use it for stereo vision and in this case we used it to estimate a geometric transformation to give us the bounding box for our object and object detection okay and so that's one way in which we can do object detection but really this is fairly limited approach what we did here was just to detect this particular queen of Diamonds with these particular features okay so if we used a jack of spades let's say it would be a totally different set of features it would we couldn't use that same exact code to be able to detect the jack of spades right unless we initially started with jack of spades features so you could build up a library of all these different features but still that's not quite as effective really what you want to do is an algorithm that allows you to detect a playing card let's say or a face card at least right or maybe just a face in an image or people in images or cars this is typically known as categorical object detection and it's where you're trying to find a category and there are ways to represent faces in general rather than a specific face like we did with features before but with different types of features and let's talk about that a little bit okay so the way you do category detection is by using features these different type of features that are descriptors and they're more in they're known as regional descriptors plus a machine learning algorithm which is basically a classifier and the classifier helps us recognize which features basically are the ones that represent faces and we train this on very large sets of data so for example for face detection you might use five thousand images of faces for a people detector like this you might use five thousands you know several thousand images at least as many as you can and the typical machine learning algorithms that we use are support vector machines or Ada boost or cascade of ADA boosts weak learners like they use in the viewlet jones algorithm which I'll explain in a little bit okay so category detection use features plus a machine learning algorithm okay so let's take an example here so here's one example we have a system object called vision people detector which allows us to detect people in an image and you can see here it detected a number of different people in this image and this algorithm uses Hogg features which is histogram of oriented gradients and a trained support vector machine classifier okay so now let's dive into a little bit into what histogram of our ingredients are and a support vector machine before we come back to using it in MATLAB so histogram of oriented gradients is basically a regional descriptor so imagine this is our region right typically like in the algorithm we're using we use regions of 128 by 64 there's also another size we use as well I believe that's 96 by 48 and what we do is we take a gradient image of this we know we look at the gradient in both directions and then we break that data up into cells in this case these cells are a non-overlapping rectangular grid typically for histogram for ingredients it's an overlapping grid of cells and then we make a histogram of each cell and that makes an image that looks kind of like this one right so each one of these is a cell and we're making each one of these as a histogram and if you lay it out spatially where all the histograms occur it looks kind of like this but here is all the individual histograms and to turn this into a feature that we can use with the classification scheme we just take each of these histograms line them all up together one after another into one long vector okay and so that becomes our representation of this image now if you imagine a larger image that contains many people or a larger scene in order to be able to find a person you would then slide your histogram of oriented gradients algorithm over all these windows and regions of an image to find the ones where it's most relevant okay but this is the feature type so we take this information from maybe a thousand images or so and then give it to a classifier that we train that classifier to be able to recognize when there are people and for each image sets we use we know that there's either a person in there or not a person in there so we have truth information as well so with support vector machines how this works is it takes data it's basically trying to do clustering right to split the space into either person or not person right one or zero okay and so it's trying to draw the straight line and it's also trying to maximize the distance between the two classes so that there's a big boundary region between these two areas okay but the problem is that like with the histogram before any ingredients I just spoke about these are not simple data where you've just got an X and a Y value you know these are multi-dimensional and typically multi-dimensional you can't separate that with just a straight line well what you can do then is known as a kernel trick that you can use and the kernel trick means you basically you project this data into some other space using maybe a polynomial or some other algorithm if you will or some other formula and you project it into a space such that you can draw this line in that space okay and by using the crown trick we can then use support vector machines which is a fairly straightforward and simple algorithm okay so that's what's inside of the people detector it uses hogg features uses a trained SVM classifier let's now go back to MATLAB and to use this algorithm inside of MATLAB and see how it works clean up our workspace here a bit and all our windows alright so now what we're going to do is let's load this image that we're going to detect people in ok so here's a image that we ship with the computer vision system toolbox it shows a few members of our development team plus me here mom and what we wanted to do is to detect people in this image and so what we do is we set up our system object here we instantiate it and then we could just apply it with its default settings and then with this other function here this is one of the nice things that I was telling you before about about annotating images and videos so we can use this we can annotate a video frame in this case we're in taking an image and as you'll see what we'll do in this section is we've detected people and then we've put the likelihood the strength of that detection up here for this one that number goes off the screen ok so this is a annotate object and we can draw around all sorts of objects I'll be using this and in demos down the road you can see here with default settings we've detected only two out of the six people in this image okay so now let's play around with those settings let's take a look at the documentation and see what it says about people detector and to see what we can do with it all right so here's the basic description of it how to use it okay so it's got two different classification models that it's been training it and so this is basically a description of the training data that we use when we calculated the hogg features and then train the SVM and it was trained with one of two different models up right people at a size of 128 pixels by 64 pixels or upright people at 96 by 48 the default is 128 by 64 okay so now there's a bunch of other things we can control so here's the classification threshold this helps us decide when it's a false detection or not it controls the sub-region that gets classified as a person okay so you can change that you can see you can make it a little bit weaker you say I can be more forgiving I can accept more false positives or maybe you want to be really stringent okay but then you might miss some actual people in the image kind of like we did so far there are some other factors here that are important to know like min size and max size these are not necessarily useful to help you improve results but what they are useful in is to reduce the computational complexity so if you know people are going to be larger you know or take up a certain number of pixels that you can set your min size to be a little larger and so won't look at a whole bunch of smaller sized windows and that could be really useful saves a lot of computation time okay you can also change the scale factor which it changes the window size so it starts for example with 128 by 64 and it can or the small size and it goes up by a certain factor you can see the default here is growing by 5% and also remember as I mentioned before we move the window we look in a whole bunch of windows you can move pixel by pixel but by default we're move over every our window is you know let's say 128 by 64 and then you would jump over eight pixels to the right and then another 8 pixels for the next window and that's how often it looks but if you want to have a more fine control over that you can change your window stride and how many rows and columns the window moves over ok so these are some of the factors no now let's see what happens when we play with that a little bit let's say because we know in that image that the people are generally pretty large in the image so we can set the min size to be pretty large this will allow it to operate faster but at the same time we're going to set the window stride so instead of 8 by 8 we're going to look every 4th pixel instead of a window starting every 8 pixels so that's twice as many windows that we're looking at but by saying this minimum size to be larger we're not looking as many different window sizes and so that will run faster so now let's run that and see what happens okay so now we've found three people in the image we've done a little bit better okay so we can keep playing around with this and find more people but what I want to show you next is a really useful way to use this even though it's not always finding people all the time and that's to do it with video so we're going to read this AVI file that ships with the computer vision system toolbox VIP train inside a train station okay we've got we're going to set up a video viewer here with video player and while the video file isn't done while we still have frames we're going to read a video frame we're going to detect people in it we're going to annotate the video and then we're going to view it so now let's run that so you can see here we're detecting this person we may not be doing it in every frame but it's generally pretty good and we're finding people and so this is often when it's most useful in video it when you don't necessarily have to make a detection in every frame okay and you can see it's finding people but it's not finding me as I'm sitting down on the bench here but then when I stand up again you'll see that it actually will detect me okay and so you know using video we could also people and then we can track them and keep track of them from frame to frame or over longer periods of time we'll be sure I'll be showing you that in a little bit okay so that's people detection and it's use in video and playing around with some of the basic capabilities okay now let's talk about face detection okay so viola Jones face detection is the old John's very famous algorithm and it uses for its features it uses Harlech features which I'll explain to you in a little bit but it basically they kind of have this rectangular representation here and then a to boost classifiers in a cascading structure this is a lot more complicated than people detector and frankly I see it do a lot better it generally does detect faces it also is set up and trained to detect eyes noses mouths and upper bodies as well I'll show you a little bit of that in a minute but now let's talk about the two main pieces of the Vela Jones algorithm the HAR like features and the Cascade of a to boost weak learners so now our liked features first so this is an approximation to a Gaussian derivative filter and basically what you're doing is your filter is where it's white here you can imagine in this region it's a value of 1 and in this region it's a value of negative 1 so basically you're summing this region and you're subtracting the sum of this region and when you apply it to face detection for example it ends up looking like this you use rectangles of all sorts of different shapes and sizes but these 4 different fundamental structures and you apply it to in the case of viola Jones to have 24 by 24 pixel window and and you can see here you know so you can imagine like here this particular feature would be a good representation of a face that would be likely to say if it if it had the right value that that would be a face but when you calculate these hard light features over a 24 by 24 window there's actually over 180,000 rectangular feature values are returned and so this is a lot of values but a turn when you use the Villa Jones algorithm and you have the Cascade of classifiers this actually minimizes the number of features that need to be calculated so each phase once you train it up it only needs to calculate certain feature values that it know are the relevant ones for that particular stage and that will become a little bit more clear when I talk about the Cascade classifiers so each cascade is gentle Ada boost and so basically what it's doing here is it's at each stage what you want to do is you want it to reject negative samples quickly right so it doesn't need to have the at the first stage it doesn't need to have the perfect classifier to know it's a face or not face it just needs to reject the things that aren't faces because really you know in the world you generally know when you're looking at any image even if it has a face or TuneIn in it most of the data in the image is not a face so you want to be able to reject these regions pretty quickly and so that's what that first phase is doing and then it's going to refine it with each phase of the classifier okay so it's basically like taking a classifier to find a classifier to loosely determine if it's a face and then you take the subset of that data and determine out of that data what is even more likely to be a face and then you keep doing that again and again a number of times typically maybe up to 20 times you have 20 different phases and each face uses different features different far like features so you're not calculating necessarily all 180-thousand in every phase okay so that's next so let's go back to MATLAB now and work with our face detector okay so this is in on an object called system object called cascade object detector and we're going to detect the face that this also happens to be the default so you could call this without this we're going to use that same image that we used before and we're going to detect the faces in it though and you'll see here when we look at the figure it actually finds all of the faces here so the viola John's face detector is actually very robust and it works very well in images and you'll see in images I've seen images of 40 people in the in its default mode it'll maybe 38 or 39 of them and just requires a little tweaking to get all the rest of them so now let's look at another mode okay oh wait actually first let's take a look at what happens when you use an insert object annotation with this and you can see here we put a little number on top of each one we could assign a name to each of these do whatever we want but we could then use this information so this is just some little extra annotation in addition to just drawing a rectangle around every face that you could use okay so now we're going to use a different set of trained data with our object detector this time now we're going to detect upper bodies okay and in this section you can see we got everyone's upper body but we also got an awful lot of false positives down here okay and so the way you typically deal with this you know if you're trying to make a very robust object detector what you'll actually end up doing it's kind of like what Fela Jones algorithm does inside of itself with the Cascade first you'll use one kind of detector and then you'll use another so in this case first we used upper body but then if we really want to make sure we get the right upper bodies we could use face detection after that so in this case for example in this code what we're going to do is we're taking all of the bounding boxes we've found for the bodies we're going to crop the images down we're going to detect faces in all of them and we're going to find out which ones actually detected a face so if the face detector you know we had a body detector and now in a face detector and so now we're using the face detector again and if it doesn't return a bounding box then there was no face found we're going to use that fact and then we're going to find only the faces in those areas so now you can see we're back to finding the faces in those bounding boxes where the upper bodies were okay so now let's check out how robust this face detection algorithm really is okay so we're going to use just one face here okay and so now what we want to do is we want to try rotating it to see if that breaks the algorithm so if we rotate it by 10 degrees you can see it still detects the face I'm going to dock this figure window now so that way we can play around with the numbers and still see it so now let's change this to 15 okay it still detects the face how about 20 this is 20 degrees of rotation right so I am rotate is a function that allows us to rotate an image it's an image processing toolbox and then we're going to use our face detector with it okay 20 Wow still checks it out let's try 25 degrees typically what I've seen is at 15 degrees 15 or 16 degrees is the max you can do reliably in this image it went up to 2025 degrees let's try it again with 22 degrees it found it 23 degrees okay yes so right between 22 and 23 degrees is where it broke down with this particular face but basically somewhere between 15 20 degrees or so is where the algorithm typically breaks down so you could tilt your head a little bit and still to detect the face but you know the one thing about these detectors is you often have to train the detector for different either rotations or out-of-plane rotations so let's say you're looking sideways this is actually a model built in to be able to detect a profile face you know and imagine if you're training with something to detect a car you would need to train a different detector for the front of the car for a side of a car and for the rear of car because these will often look very different okay you might even want to train the difference between an SUV and a passenger sedan because the front of those cars can actually look very different as well okay so let's go back to our slides so now we're going to talk about object tracking so for object tracking we're going to talk about a couple of different algorithms that we've shipped recently and there's a fundamental difference though I want to talk about briefly first and that is the difference between sort of tracking from a bottoms-up you've got an object you've detected and you want to figure out where it's moved to and that's what we're doing with klt tracking versus a multiple object tracking approach where you're detecting the objects in every frame and then you're just trying to associate them with tracks to figure out okay that's person number 16 that's person number 17 and not getting them confused and common filter actually helps you with that so I'm going to talk a little bit about klt based tracking common filtering and then putting it all together in a multiple object tracking framework that we actually ship as a demo inside of computer vision system toolbox so first let's talk about face tracking with klt so in this demo what we're using is the viola Jones algorithm again to detect a developer's face this is Dima and we're going to then within the region where we've detected the face we're going to detect corners and these are features and basically we've got this system object here for the KLT algorithm which stands for kanaday Lucas and Tomasi the three professors that develop the algorithm together and we call a point tracker inside of our tools and once we have the points for the object that we're trying to represent we're going to track these features now the nice thing about this is you don't have to find every point in every frame just as long as you find enough of them okay then you're able to track the object and this works particularly well so let's go to MATLAB now clean up our workspace again alright so now we're going to detect the face we've got a video file here now let's make that bigger we've got a video file tilted face a VI we're going to read the first frame and we're going to detect a phrase face in this frame okay so there's steam is face okay so now we're going to use this region and we're going to detect features inside of this region and in this particular case we're going to use our corner detector we're going to detect the she intimacy minimum eigen value corners okay and when we do that and then draw markers for each of the corners we find you see they're all over his face here and that's pretty typical okay for face tracking okay so now what we're going to do is we're going to set up our point tracker so let's take a look at the documentation for it so I right clicked and selected help okay so this tracks points in video using the candidate Lucas Tomasi algorithm like I've mentioned okay and we have to give it some initial points and an initial image okay and we can determine the block size the number of pyramid levels this helps us when the if the object gets generally point tracking is also very robust to changes in scale but this changing this parameter helps that even more if you use more pyramid levels you could detect things to get even closer than you might have started with it gets a little harder when they're further away them now there's only so far away that that it can work it still needs to be able to detect the features okay so let's set this up and initialize it okay now we're going to set up our video viewer and here what we're going to do is once we detect the points we want to make sure that they lie in the region that we care about okay so we're actually using a combination of the point tracker of finding the points and the geometric transfer method here make sure I ran all my sections of code I don't think I did there we go so now we're going to run in the loop and what you're going to see is as every frame is you detect points we don't necessarily need all of them and that's sort of the same thing like we were doing before with the card bench you know we we can robustly work with as long as we have a sufficient number of points we can reverse we work with it and still find the location of the object in the image okay so now it's worked with all the images and you can see it's robust to him moving around back and left he moved back and got his face got smaller and move closer his face was larger he tilted his head so rotation and scale changes and translation it's very robust so this if you have one object that has a decent amount of contrast that you want to track this is a very good way to do that but if you try and do this with a large number of objects it can slow down quite fast okay so this is really good when you're trying to track a small number of objects from frame to frame all right let's go back to our slides okay so next algorithm we're going to talk about is Kalman filtering now I'm not going to go into depth on how Kalman filtering works but I will show you how you can use it with computer vision system toolbox and basically what it allows us to do is to predict and then correct based upon detection so that we make and allows us that when we have occlusions like here we've got a ball it's going across the image it's going underneath this structure right basically a little tunnel you might imagine and so you can't see it so we need to be able to predict where it is while it's in there and so that way when it comes back out we can keep track of it and we know that this is still the object so we're keeping track of the object as it moves along with common filtering so that way we can know which one it is Kalman filtering is also useful for us to keep track of the difference let's say we had between two objects so let's say we had another object up here and an object here well using common filtering we can keep track of which object is which okay so let's switch to MATLAB and see what this is all about all right so let's clear all close all clean up and move on to the other code so now let's take a look at the video that we're going to use okay so we've got this ball it goes under a box okay and so we can deal with some occlusion here and test out our Kalman filter okay so let's first we need to detect the ball we need to be able to detect it in every frame and the way we can do that is to use the foreground detector okay as we've got here and this is basically using a let's take a look at the dock a Gaussian mixture model okay so we can see here it uses color or grayscale video and it builds up a background model and to determine whether the individual pixels are part of the background of four crafts okay so it uses multiple gaussian mixture models in order to because sometimes there's a little bit that might be moving in the background let's like in this instance there isn't any but you know outdoors you'd have a tree that would sway in the wind or you'd have you know a little ripple on the water something like that all sorts of reasons why you things you need to account for that are truly part of the background are not really part of the foreground okay so let's take a look at our code so we use this foreground detector and then we're going to use blob analysis okay and blob analysis allows us to ignore all of the pixels that aren't necessarily objects and find the one thing that we care about which is our ball okay so and if we find the ball so we say if it's not empty then we're going to insert an object annotation and then we're going to say ball so let's run this code okay so you see we're texting the ball we're finding it all of a sudden it goes under no ball found and then ball found again okay and so this is basically just foreground detection and blob analysis to be able to find the object that's moving in the video okay fairly simplified video so that way we can really get to the heart of the matter with Kalman filtering which is what we're going to discuss next so now I'm not going to dive into the depths of Kalman filtering because that could be a whole webinar into itself but what I will do is talk to you about the configure Kalman filter function that we have which really makes life easy okay so what this does is it produces a Kalman filter which basically has all the state matrices that you need in order to be able to run Kalman filtering and it's pretty complicated so it makes a couple assumptions first it's going to assume we're going to so this doesn't necessarily give you all the flexibility that common filter does but it constrains it in a way that makes it still useful and easier to set up so it has two basic models that it works under it either will assume that you're going to operate under constant velocity that you're tracking something that moves at the same rate or constant acceleration okay like something you throw in the air or is dropping to the ground under the under gravity or something like in our case with the ball there's some friction on the carpet and that's slowing that's decelerating the ball okay you need to also give it an initial location and then an estimate of error take a quick look at this okay so if you've got constant velocity you you need a location variance and velocity variance with constant acceleration you need three perimeters and you add the acceleration van true variance you also need to have a model for the motion noise okay and a model for the measurement noise now to simplify all this what we've done actually here in the code is I've given you some rough numbers that we know that work generally these are the things that you would need to play with so I've set up my code to test toggle back and forth between a constant acceleration mode and a constant velocity mode okay and so I've got some initial parameter set up here that you can use there are also some in our examples that we show in our documentation so that way you can give you a good starting point of estimates for these initial values so let's run these I'm going to assume actually I'm going to start with a constant velocity assumption okay and now in the loop here what I'm doing here's where I'm doing my detection tracking let's undock this take a look at it big and so I've done here up here I'm doing my foreground detection find the ball doing blob analysis actually also doing a little morphology to be able to get rid of small objects they might find maybe some noise in the background okay so here once I've detected the object I can configure my common filter with this initial location of a detection okay and and then I keep doing it and so if the object is detected it keeps doing the prediction and in correction but if the object isn't detected it only does the prediction and keeps moving forward with that prediction as the track location okay and then when it actually finally detects the object again then it'll actually correct the track so let's take a look at this and see what happens so here what you can see is we've got the actual corrected value it's now predicted when it's not found and then it's corrected if you run it again you'll see actually what it's done is because I'm assuming a constant velocity and it's actually slowing down it zooms it gets a little ahead of itself and that's because I'm assuming a constant velocity okay it's not too far off but it is you can see it does get a little bit ahead okay so now let's stop this and rerun it again with the constant acceleration mode so I'm going to change my code here to one and rerun the code okay and now we're going to run the code again okay so we're running it it's doing pretty well it's guys predicted you can see it's slowing down and now it's done a much better job because it's assuming deceleration here and it does a much better job of predicting where it is and you could really see it's starting to slow down towards the end okay so that's doing a good job and so that's the basics of how you can use Kalman filtering without knowing a whole lot and having to read the entire common filtering textbook which is you know you could fill a whole course not just a webinar a whole course with that topic okay so I've got one last demo I wanted to show I'm not going to dive into this one because this one is complex but it builds up from the object detection capabilities with foreground detector that I showed you before little morphology and blob analysis then using common filter to predict where a particular object is moving we keep track we have tracks so we assign numbers to everybody as they walk through the scene but then what we need to do is we need to be able to if we have multiple objects what do you do how do you keep track of who's who and that's where we use the Hungarian algorithm to be able to assign detection to tracks and we use the information that we get from the Kalman filter to be able to assign the cost values to know which one is which okay so I'm not going to go into a whole lot of that but what I do want to do is just to go back to MATLAB here and to run the demo that we've got we've got the multi object tracking demo it ships with the product it's very complex there's a lot to it there's a lot of writing on it but let me just quickly run it so you can see what's going on in here and video is pretty big actually that it works with okay a little too big to show on my screen right now but what you'll see here this is the foreground map of the objects being detected by the foreground detector okay and as people enter the scene they get a bounding box and now this is track number one as the next person enters the scene you can see them start to appear over here he'll get track number two and now we're keeping track using the Hungarian algorithm of who's who in different tracks you can see we don't actually fully lose him behind the tree over there we've got a third person here lurking behind the pole he'll pop out in a second actually we've lost them completely you know if you really wanted to get sophisticated here you would then also keep track of people that reappear but you would need to be able to recognize people to some degree to be able to recognize what if they go off the scene and then come back but here now we're starting to keep track of a whole bunch of different people all at the same time I suggest taking a much closer look at this video to get a sense of what's going on okay but that's multiple object tracking in a nutshell very quickly I just forced to close there all right so that was the multiple object tracking demo and that was pretty much all I had we only had enough time to show you a few examples but you may wish to take a look at some of our other materials that are available on our website and in particular that multiple object tracking demo I just showed you to take a look closer look at it we also have full documentation available online so you can look more closely at the particular functions and system objects that I showed you in this webinar or additional capabilities I didn't get a chance to cover to suggest contacting your sales representative who can help guide you with our products and point out other capabilities that we may have that meet your need for example we have an entire library of machine learning algorithms and statistics toolbox that I didn't get a chance to talk to you about so now we've completed the webinar thank you for joining us today
Info
Channel: MATLAB
Views: 142,401
Rating: 4.9172416 out of 5
Keywords: MATLAB, Simulink, MathWorks
Id: 9_6utqvsCtA
Channel Id: undefined
Length: 46min 56sec (2816 seconds)
Published: Fri Apr 28 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.