Object tracking in video with OpenCV and Deep Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello today I'm going to show you a code pattern where we take a video and create an object detection model we'll be able to detect detect cars and then we'll use a jupiter notebook to take another video and identify those cars track them with OpenCV and label them on the video as you see here as usual we've made the code public so you can take the code and try it out and use it for your own solutions you can find the code pattern by going to developer Obie m.com slash patterns you can search for counting cars and you'll see our code pattern page here and if you click on get the code it will take you to a public github repo where there are detailed steps in the readme to walk you through training the classifier and using auto labeling on the video and then also running the notebook with object tracking so the first thing we need to do is create a model so we're going to start by creating a data set this is power AI vision and it will make it really easy for us to create an object detection model from a video so I'm going to create a data set first I'll just call it cars and the great thing about working with a video is a short video is many images so I could upload a bunch of images but instead I'm just going to upload a short video that has a bunch of moving cars so I'll drag and drop it add my video I think this is about a 30 second video but even that includes many images and when you're doing deep learning you want to have a good data set this is really quite small but if we take all the pictures from all the frames of the video we have quite a few pictures of cars they're somewhat redundant but the angle changes the lighting changes and by using all those frames we can come up with a very good model so first what I'm going to do is teach it what a car is so I'm going to add an object label of car save that and now I'm going to go through a few of the frames and identify where the cars are in those frames now the first thing you'll notice is we're still not writing code we'll get to that later with the notebook but this is power bi vision and it's very easy to work with the video with just clicking of the mouse so first I need to extract some frames out of that video I'm going to do that just by saying Auto capture frames every 5 seconds and it will extract images from a video for me now I'm gonna take these six images and what I want to do is I want to identify all the cars in these images so I'm going to take about 6 frames I will label about 30 cars by drawing a box around each car in each frame now this part is manual it's a little bit tedious but what I'm going to show you is how we can do this with a little bit of manual labeling I'm going to train a model and this model will be good enough to use to auto label the rest of the frames identify all the cars in all the frames so I'm speeding up the video a little bit here but you can see it's not too bad it's it's ok labeling some cars but if you had a really large data set you definitely want some auto labeling happening here so I'll finish this one up adjust that just right there's my car now I'm gonna save this and so I have my six frames I think I labeled 227 cars and I'm going to train a model the training will take a little while but at least I won't have to be sitting here manually doing all the labeling I'm going to leverage that trained model to do auto labeling next so I'll just hit the train button I want to make sure that this is being trained for object detection remember we're not just trying to decide whether it is a car or not a car we want to identify each car in the picture so we can count them and track them as they move otherwise I just pressed the train button again and will come back when it's finished so I'll speed up this part of the video while we watch it train but then as soon as it's complete we'll just take a look at the model details and I'm going to deploy the model when I deployed the model that gives me a rest end point that I could use for inference for object detection but what I want to do first is just deploy the model and then I'm going to use that model for auto labeling so let's go back to our cars data set that we started with and let's make it a bigger better data set so I'm going to open that up again remember I have a short video and I extracted some frames and did some manual labeling now I can click on the auto label button and say let's do every second and I pick the cars model that I just built based on my manual labeling and I'm going to use that model so I won't speed this part up so it's labeling the frames as you see they just increased down below if I click through them I think the green boxes are the cars that were Auto labeled and you'll see blue where I manually labeled them so I click through all the frames you can see we just significantly increase the number of frames and the number of cars that were labeled so now I'll train the model again using this data set which has significantly more pictures of cars in it so with a nice UI you can adjust the pictures verify that auto labeling was correct in our case we're done we're gonna go and take a look it looks like 150 cars are now labeled instead of our original 27 and I'll just click the train button again because I want my model to be built with this bigger data set I think it will give me a more accurate model for detecting cars in the video make sure it's an object detection model again and click train once this is done training this is the model I want to deploy so I'll just click deploy and I'll have a rest endpoint I can use from my Python code so here's our Python notebook so what we can do in this notebook is we have the documentation but we also have Python code so we're going to run this code and we're going to use OpenCV python api to process the video so I can take a video on Turnitin the individual frames so now I'm doing it in code I can take those frames and I'll just use a sampling of them and send them to power a vision using that endpoint that we just deployed and it will identify cars in the image and return the coordinates so that we can draw a box around each car now in addition what I did is I'm keeping track of them and numbering them because I wanted to follow the cars down the road so that instead of just saying while I had five cars in one frame I had seven cars in another frame and coming up with some kind of average number of cars on the road I wanted to count them as they cross this invisible finish line the application of counting things as they reach a point of interest applies to a lot more use cases than just how many on average did you see if I can count them as they cross I can get cars per second but I can also determine say how long it took a car to get from point A to point B so I'm tracking them and that's mostly done with open CV code well let's take a look at some of the more important or more interesting parts of the code first of all if you look at the notebook at the very top here or close to the top you'll need to customize this URL so that it points to the end point of the model you just deployed so we're going to use the this to do object detection on some individual frames using the model that you just trained and deployed using power vision next there are a few interesting constants here when we're downloading a video from box this video is very similar to the one that we use for training but it's not video it's different piece of the same video perhaps also you might be interested in looking at your frames and your output folders you're gonna see that we use Python to split the video into individual frames and store them there and also we're annotating those with boxes and numbers so you can especially here debugging you can look at the individual frames and see what's working and what isn't as we draw lines on those people's frames the sampling I'm using here to only send a sample of frames to power a vision for that object detection the cars really don't move that much from one frame to the next so I'm only classifying some of the frames and then I'm using a tracking method to follow the car from one frame to the next and just drawing on every frame so a few colors are set here the the finish line I think is hard-coded but the starting line is one that I was experimenting with if you'll see the comment here if this if we go from the very top those cars are very small in the distance I can sometimes improve the results a little bit by creating a starting line so then I'm really counting the cars I'm detecting them after they cross this line and I I stop tracking them when they cross the finish line so we have some Python requirements we download the video right here in the notebook so you don't really have to do anything except to run this now the parsing as you see we're using open CV so we're exploding the video into frames in addition Open City provides some things like frames per second height width that will help us as we're drawing lines on the frames and the frames per second is what I'm using to create a cars per second metric this is the region of interest so this is basically my finish line is set here now the inference wrapper we'll reaches want to detect cars in an image you see how simple this is we just post to that power a vision endpoint the file which is on specific frame of the video and what we'll get back is we'll get back JSON that describes and I have it right here you can see the output of a test on a single frame it detected seven objects they're all cars because that's the only thing that we're looking for and it gives you the x and y mini mech so you can draw a box there's also a confidence level so we're throwing out I think everything under 80 you don't know if we have any of those but it gives you everything you need to be able to do an object detection and label your video now the rest of the code is mostly video processing so first you know we split this into frames now we go through the frames and in that sampling lets take every nth frame and send it to detect objects so that was that very simple routine we just saw we're going to detect objects and then I get the JSON and I'm just keeping track of the JSON so once I get done with that I'm going to loop through the images again oh here are some help helper functions first I won't go through the details on those let me show you where they're called so here I'm looping through all these frames again and first I'm calling update trackers so I'm keeping track of all these objects you know with OpenCV so I'm following them along down the road so here in this routine I can detect win a car it's the same car I already saw it's just moved slightly down the road so I'm updating that tracker so that that box moves down the road with the car if the car keeps the same number and then right here there's not track routine which is one of those helper functions I created is taking that output and saying if this car was a new one if it was not something I was already tracking let's add a new object so now the cars got a number it's got a box I'm going to label it and start okay so this is how I'm doing the tracking and following these cars down the road and then the other thing is as they reach the finish line the metrics get added up so that I can count how many cars that I see you see they're numbered as they come down but the the total gets incremented as each car passes whether it's in the left lane or the right lane here's the cars per second and the lane counter so the codes all here with the opencv that annotates so every frame I'm drawing my finish line I'm drawing the left and right little hash marks we're drawing boxes on the cars along with their number total cars detected and depending what algorithm we use with OpenCV we can lose cars as they overlap with each other and I'm I've recommended some ways where you can improve that number also here I'm still in the notebook but if I run this we have a nice way of looping through the annotated images and you can see these in the output directory but I'm just looping through so that you can watch it right here in the notebook which is very satisfying to see the results of your work and that you can actually count these cars as they as they go down the road but in addition if you really want to label and create a video now there's a command here that's just commented out but I also have a tools directory where you can take all these frames and reassemble a video once again here are the results we use power bi vision to identify cars or using Python and OpenCV to track them and count them as they cross the finish line all the code is there in those helper functions if you want to look at how do you label how do you draw boxes draw lines that kind of thing on a video and it's really a fun code pattern to try out I hope you give it a try and find your own use case and do something similar
Info
Channel: Mark Sturdevant
Views: 146,539
Rating: undefined out of 5
Keywords: Jupyter, Python, OpenCV, PowerAI, Deep Learning, Object detection
Id: 19vaot75JCY
Channel Id: undefined
Length: 15min 15sec (915 seconds)
Published: Thu Sep 06 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.