Yolov8 object detection + deep sort object tracking | Computer vision tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey my name is Elizabeth and welcome to my channel in this video we are going to work with object detection using yellow V8 and we're also going to do object tracking using deep sort so this is exactly what you will be able to do with today's tutorial you can see that is a video of people who are just walking from one place to another and each one of the people in this video is one of the persons in this video has a bounding box of a given color and the way it works is that as long as the object tracking algorithm as long as deep sword is able to track each one of the persons in this video through all the different frames then the rounding box will remain of the same color if I show you an example if you can see this person over here has a brown bounding box and let's track all this person through all the different frames you can see that the bounding box will remains being Brown through all the different frames in this video and I can show you a new example for example with this person with a purple bounding box you can see that this bounding box remains being purple through all the different frames and the same happens when with this blue bounding box over here which means that the person is being tracked through all the different frames in this video so this is exactly a projecting which we will be working today on today's tutorial and now let's get started so let's get started with this tutorial let's get started with this object tracking tutorial and let me show you the repository we are going to use in this project this is deep sword official repository and we are going to use a fork I made from this repository which is this one this is my account this is a fork from the official deep sword repository and the reason I made this Fork is because I need to make a change a very small change a very small edit in one of the functions so I made this function tensorflow to compatible right I am going to use tensorflow 2 in this project and I need to make this very small change but other than this this is exactly the same repository of deep sword so this is the algorithm we are going to use in order to do object tracking and now let me show you the object detector we are going to use in this project because if we are going to track objects if we are going to do object tracking we definitely need data and we definitely need annotations we definitely need detections on which to do some object tracking right so this exactly the algorithm the object detector we are going to use in order to produce the detections we are going to input into deep sort right so we are going to use deep sort as an object tracking I green and we're going to use yellow V8 as an object detector right so this is option this is yellow V8 official repository and we are not really going to use the repository itself but we are going to call Yellow V8 through app item package Euro V8 is available through a python package so this is exactly how we are going to use it but this is only to show you how the yellow V8 official repository looks like and I'm going to post a link to this repository in the description of this video so now let's continue and now let me show you the data we are going to use in this project this is a video we are going to use in order to test how the object detection works and how the object tracking works and you are probably familiar with this video this is a video which is very commonly used in this type of tutorials on object tracking right so you are probably familiar with this video already and you can see that this is a video of many many many people who were just walking around one place to another and so on so we are going to use this video in order to test everything we're going to do onto this tutorial so this is the video this is the data we are going to use today now let's go to pycharm because I need to show you a file I have created this is something like an auxiliary file I have created which is going to make things much much much simpler onto this tutorial it's going to help us a lot on everything we're going to do today because this file is called tracker.pi and it contains a glass which is called tracker and this tracker is basically a wrapper around deep sword right we are going to use deep sword but we are going to use it through a grapper I created in order to make things much much simpler in order to make all the process much more straightforward and much much better you can see that this Grappler has a few functions and the idea of these functions is to just grab all the complexity and all the different details of how to use deep sort so we don't really have to mind about all this complexity and we can just use deep sort from a very very high level using this wrapper right and a very very small detail regarding this wrapper is that you can see I have hard-coded this part which is the path to a model which deep sort needs in order to compute all the process and so on right in order to compute all the object tracking this is a file this is a part I have hard coded into this grapper so please remember to download this file which I'm going to tell you exactly where it is in the description of this video and please remember to save this file in exactly this location right if I show you my root directory which is this one you can see that I have created a directory which is called Model data and inside all data I have these three files right please remember to follow the same structure because otherwise this is not going to work because this is hard coded right so please remember to do that or please remember to edit this line so you can specify the location the path to the model right because you definitely need this model in order to run this iron so that's pretty much all you need to know about the repository we're going to use above the algorithm we are going to use about the object detector we are going to use the data the video we are going to use in order to test the performance of the of today's tutorial and this grapper I have created right I have already show you all these details and now we can just start with our tutorial we can start with our coding right so the free thing we're going to do is to clone this Repository this repository deep swords repository but we are going to clone my Fork their repository I have fork and the changes I have made otherwise it's not going to work otherwise you know it's not going to be compatible with all the the dependencies I have specified for this project let me show you super quickly the requirements file and you can see that I have also specified a python version I'm using which is three python 3.7 and this requirement which is tensorflow 2.11 we need to make this uh small changes I show you over here in order to make this compatible with this tensorflow version so now we're going to clone the Deep sort repository I have forked so we're going to clone the the one that's in my account and I'm going to copy the URL git clone and that's pretty much all so I have cloned this repository and this is if I refresh this is where the repository has been cloned because you can notice that if I go back to tracker.pi at the top of this file you can see that there are a few Imports which are in which is important for you from Deep sort now this is not going to work unless you clone deep sort repository so please remember to do it as a first step so that's pretty much all I assuming that you have already installed all the requirements that you have already cloned deep sort that everything is just completely ready and you can just start working on this project this is how we're going to do we're going to take it one step at a time it's always a good idea to take projects one step at a time so let's start by importing OS and by importing opencv now let's define the location the path to the video we are going to use and in order to move once every time the only thing I'm going to do for now is just reading this video loading this video and I'm going to explain one frame at a time right I'm going to display using IM show one frame at a time so we make sure we can read the video and we can visualize the frames in this video that's going to that's what I'm going to do in order to get started on this project so the video part is seen us part join data and the video is called people.mp4 which is a very appropriate name for this video because it contains people right it contains people walking so it should be something like people walking.mp4 but people MP4 is it's just fine so now I'm going to call CV2 dots video capture and I'm going to specify this video location which is here and this will be called CAP right let's take it one step at a time now what I'm going to do is I am I'm going to read a frame from this video so cap.read this will be red frame and now while red I'm just going to visualize frames from this video so CB2 IM show this window will be called frame frame CV2 wait key zero sorry I'm going to specify something like 25 milliseconds will be just fine and obviously I have to read a new frame so red frame equal to cap dot read and I'm going to release the memory before I forget and I'm going to destroy all windows by calling this command okay so that's pretty much all so let's see if we can execute this piece of code successfully let's see what happens and you can see that we are just looking at the exact same video I show you a few seconds ago we may need to adjust the uh this 25 milliseconds because I have the impression it's running a little faster than in the than in the original video but this is mrd is only to show you that we can successfully execute this video and we can read this video and we can just display one frame at at every time at every at every execution um and now let's continue so we can read frames from this video we are reading frames and what we need to do now is to use yellow V8 in order to produce your detections right the detections we are going to use in order to input these detections into the object tracking algorithm in order to do that I'm going to import from ultralytics Imports yellow and I'm going to do something either I'm going to Define it somewhere around here uh maybe somewhere around here and I'm going to say model will be YOLO and I'm going to specify the YOLO model I'm going to load and I'm going to work with the YOLO V8 pre-trained model right you can just use a pre-trained model from yellow V8 in case you don't have a model ready so that's what I'm going to do now so the model is called YOLO B8 Nano dot PT if I'm not mistaken and that's this is exactly the model we are going to use so let's take it once at a time and let's see if we can just execute this code let's see if we can just create the model everything seems to be working fine because we didn't have any error so now let's continue now let's produce all detections and in order to do that I am going to call model Dot uh are you just going to call model and I'm just going to open these parentheses because this is something like a function we are using and I'm going to input the frame I have just read from the video right I'm just going to input this Frame and I'm going to call the output detections in order to go one step by time again I'm just going to print detections and let's see what happens okay you can see that I am getting something I'm getting a lot a lot a lot of information so everything seems to be working just just fine so uh we are moving One Step by time but you can see that we are making a lot of progress a lot of progress so now what we need to do is to unwrap these detections right because we had something which looks like this this is like an entire class this is something that it contains absolutely all the information we need in order to in order to work on this tutorial in order to get the tracking and so on but this may not be in the exact format we need it to be so we need to unwrap all the information from here I think it's better to call it results instead of detections because I'm going to use detections later on so maybe I can just use results okay so results give results is at least containing all the detections I am going to do something like this I'm going to iterate for result in results and as I am as I am making the inference of a single frame of an individual frame this will always be a list of linked one so doing like this we are doing this we are iterating in a list of length one long story short we are doing this iteration but actually we are only doing one iteration right there's only one object inside results so we are just doing these four ones we could do it in a different way we could do it like results equal results zero we could do it like this too but I think it's a little cleaner if we just do this duration but remember we are doing this process only once we are doing this duration only once because there is only one object inside results and now let's call another object on our list which is called detections this is where we are going to save all of our detections and now let's do something like this for our in results dot boxes I think this was something like boxes dots data dot to list something like this because this is like a very very custom object type in in yellow V8 and we need to make it to look something like a list we can we we need to be able to iterate in in this object and I think this is the way to do it if this is not the way to do it we are going to with an error and I can just fix it and once we are iterating in this object we are going to do something like this and we're going to print it to see how it looks like and then we are going to unwrap all the information from within this object let's see no I had I've made a mistake all right because I'm iterating here and actually it's here now everything should be okay everything is okay and this is the results we are getting right this is R this is the print we are doing over here and you can see that this is exactly how it looks like and this is all the information we need from more detections right this is all the people we are detecting in this video and the way we are going to anger up this information is like this we're going to say this is X1 a y1 X2 Y2 and then confidence value I'm just going to call it score and class ID this will be R right remember that we are using a pre-trained model from yellow V8 and this model was trained on the cocoa data set so we are detecting all the cocoa classes so we are going to have a class ID which is exactly the class we have detected in this data set if I show you my local directory my root directory this is a file containing all the classes we are able to detect with this model and this also tell us the index for each one of these classes in the case of person which is the object we are going to detect more more often in this video This is the first element in the list which means it's the class ID 0. right it's the index 0 and if I go back you can see that this number is always zero for absolute real detections now we have detected something different than zero this is 14 maybe it's another object but you can see that in most k is a zero we are also attacking a 14 we're also taking a 14 let's see what is 14 this is zero so 14 is bird we are detecting beers in this video maybe I don't know I don't know what's going on maybe word detecting these words with a very low confidence or maybe there are beers I haven't and noticed or maybe it's a Miss detection and it doesn't matter now let's continue so these are the detections we are getting we are I am grabbing all these detections and now it's a time to call or grab her right now that we have successfully unwrapped absolutely all the detections from yellow V8 from this yellow V8 object detector now it's a time to call Deep Source object tracking in order to start doing your tracking and I'm going to do something I'm going to detections.append and I'm going to create a list which is X1 I'm just going to go I'm just going to save all of this right this is all the information deep sort needs in order to do the tracking and now we have the detections the only thing we need to do yeah the only thing we need to do is to call the tracker maybe I can do it as an integer otherwise I may have a problem or maybe another way to do it maybe another way to do it is do it here so we just read everything as a integers let's say X1 equal to int X1 and let's do exactly the same for all the other variables these a little noisy but yeah it's okay okay and this is the same with class ID okay okay everything's okay okay perfect so we have unwrapped all these objects and we have converted everything into integer or everything that's an integer we have made it an integer and now it's the time to call the wrapper in order to do that I'm going to import the wrapper first which is from tracker import tracker and now that we have imported tracker the only thing we need to do is to initialize it I'm going to do it here tracker equal tracker I'm going to create a new instance and then the only thing we need to do is to call I'm going to do it here tracker dot update and we need to input let me show you what we need to input in order to call update which is the most important function of the tracker because it's where it's going to update all the tracking information and everything that's going on in this video we need to input two parameters which are the frame and the detections right and so the detections we have just created exactly the same object we have created over here and then we also need to input the frame remember for the way object tracking works for the way deep sort works for the way this specific algorithm works the the algorithm computes some features on top of the frame on top of the um on top of the objects which were detected right it's going to crop the frame and it's going to extract some features from it so that we need we definitely need to input the frame and then the directions okay and now the only thing we need to do is to iterate in all the tracks in this iron so I'm just going to call for track in tracker dot tracks and this is what we are going to do I'm going to access the track um bounding box right this will be the bonding box and then I'm going to access the track ID which is the most important information from the object tracking algorithm which is going to be the ID the tracking eye ring has assigned to this object right so track ID will be track dot track ID okay and that's pretty much all remember we are using an object trunk in algorithm and everything we access one of these objects each one of these tracks is an object right it's a it's one or more objects in the video and we are accessing the bounding box of that object in this specific frame and we are accessing the track ID the ID the object tracking agree in deep sort has assigned to this object right the ID will be exactly the same across all the frames in this video right so each object or actually each person because we already show you the video this video contains people and all the objects we are going to detect are people our persons well we already know this we are detecting beers also for some reason but it doesn't matter most of the objects we are going to detect are people and each one of these persons is going to have an ID assigned so the object tracking diagram deep sword is going to assign a given number an ID to each person and it's going to keep the same idea across all the video right so absolutely all the frames in that video which contains that person will be with the same ID right they will all be labeled with the same ID so the bounding box is like the specific bounding box of that person or that generic object we are detecting in that specific frame and then track ID is the ID for that object or that specific person across all the frames in that video so that's pretty much all for this tracking and then what we need to do is is used plotting a bonding box a rectangle on top of the frame for each one of these objects so what I'm going to do is I'm going to find another object which is another variable which is called colors and colors will be something like this I'm going to import random Port random and then I'm going to call colors like random dots run into this will be from 0 to 255 and I'm just going to repeat the same three times and this will be 4J in range I don't know 10. so I'm greedy now variable ingredient in this list colors which contains 10 completely random colors and what I'm going to do is I'm going to plot each one of the persons in the video I'm going to choose I'm going to plot each one of these persons with a different color right so every time we get a new ID every time we get a new identifier we are going to plot this new ID this new object we are detecting with a new color right we are just going to select another color from this list right that's the idea of what we will be doing now so I'm just going to call cpq rectangle and now I will call I am going to input the frame and then the top left corner which will be X1 y1 I haven't defined what is X1 y1 but it's basically X1 y1 X2 Y2 this is bounding box right and then I am going to do something like this this will be X2 Y2 which is the bottom right then the color which will be colors then the track ID but divided or actually mod the length of colors right because remember these are right these least has only 10 elements and we will be detecting many many many many many many objects so this track ID could have a value which is super super huge and if we don't do something like this we are just going to get an error right so we definitely need to add this mod to make this work okay now that we have specified the color and the only thing we need to do is to specify the width for this rectangle and that should be all and let's see if we are doing everything correctly let's see if we can just let's see what happens if I press play now we should be looking at our frames just like before but we should be getting a bounding box on top of our frames and we're not getting it so we have done something wrong let's see what is it we may need to cast this as an integer let's see if that's a problem yeah there was a problem I needed to cast everything as an interval and you can see that now we are getting exactly what we should be getting which is our video we are getting one frame after the other we are not getting a real-time detection obviously uh we are executing deep sort and we are also executing yellow V8 and we are running everything on a CPU so we are not getting a real-time detection and we are uploading many many different objects so deep sword is it's Computing different features and different doing different things for each one of the objects you can see in this image so yeah we are doing many many many stuff we don't have a real-time execution but nevertheless you can see that everything seems to be working shows fine everything seems to be working super super properly so the only thing we need to do now is to save this video back to our computer we have already solved this problem everything is ready everything is absolutely sold we have implemented the object detection using yellow V8 and we have done the object tracking using deep sort so everything is ready the only thing we need to do now is to save the video back to our local computer so we have the video with all the different frames and with all the different detections and so on and we can analyze this video in a much better way if we have the file in our computer but everything is ready we should be super super super happy with yourselves because we have sold the problem already and what I'm going to do is I'm going to create a new object I'm going to create here I'm going to create here which is a cap out and this will be CV2 video writer and I need to specify the location of where I'm going to save this video which maybe I can say something like video out but I'm just going to save it in my local in my current directory and I'm going to save it with the name out dot pm4 MP4 exactly same you can choose another name you can choose another location but it's exactly the same and now we need to input a couple of values and now it will be much better if I can just look at this file these are the two values we need to input you can see this one is related to the codec we are going to use in order to write this file we are going to write an mp4 file so we definitely need that um value and then we need to specify the frames per second in this video right and we want this output video we want this video we are going to write into our computer we want to make it the same frames per second as the original video that's what we want so we are going to call Cap which is this object the video we are reading right this is the opencv object for the video we are reading and we are going to say cap.get the uh this constant so we are getting the frames per second of the original video that's exactly what we are doing here we are getting the frames per second of the original video so we are going to make the output video the video we are going to produce we are going to make it the exact same frame rate as the original video which is a very good thing because we want to have like a real experience right we want to look at this new video and we want to um we want to make it as similar leaders possible as the original video and then we need to specify the size and I'm going to say this will be frame dot shape one and then frame dot shape 0. and I'm going to press enter okay so that's pretty much all if I'm not mistaken so I'm just going to take this cup out I'm going to take this object and I'm going to write the frame here just before the shot before I read a new frame I'm going to write the previous frame and the previous frame you can see that it's the one I have just draw all the rectangles on top so these exactly a frame we need to write and now I am going to release the memory from cop out 2. and that's pretty much all so the final mistake in this is pretty much all I'm going to press play I am not visualizing the frames anymore if I go to my local if I root directory I don't see any file anything so if I press play everything should work just fine we should have a huge output with a lot of warnings a lot of everything just ignore all the warnings they don't really mind in this case so yeah we are getting a lot of outputs but nothing is going on right we're not getting any error and if I go back here you can see that a new object called out.mp4 has been created and it shows writing all the frames in my code so now the only thing we need to do is waiting until this code executes absolutely all the frames and it just creates absolutely all the frames so I'm just going to pause the video here and I'm going to fast forward when this is completed okay so the execution is now completed and let's see our video let's see how our video looks like this is the video we have generated and you can see that we have different colors for different ideas for different objects or for different people we have detected and you can see that the yeah and you can see it's working pretty pretty well if we look at for example this person you can see that it's a very stable detection we are detecting absolutely the same person in absolutely all the frames the color is always the same if the color result was the same means did the it hasn't detected this person as another person but it's always the same person and the same happens with absolutely all the people I'm seeing over here maybe there are a few misattictions for example here you see it goes a little crazy the detection it's not it's not 100 proper and there may be some more misdictions maybe over here and maybe in those cases where we are not getting a good detection but nevertheless I would say this is a very very very good result so this is going to be all for this video this is exactly how you can Implement an object track in diagram using deep sword using an object detector as yellow V8 right so and now that I realize I see that we are only seeing people we're only detecting people and I don't see any bird through the entire video right maybe I am missing the word if you notice the beard please let me know in the comments below but I don't really see it but you know this in the detections we were detecting a few builds so something that I may recommend you to do is add in something like uh if score is greater than detection threshold then and only then save this object this detector these detections into this list right into the detections list and I'm going to Define detection threshold as I don't know maybe 0.5 let's executed let's see if everything works probably everything should work properly because it's only like a very small edit and this is going to fix some of the misaditions maybe these fixes some of the viewers we are detecting in this video and they're not any Builders whatsoever so if you want to do something like this with some value for addiction threshold that's going to make uh that's going to fix some of the Mis detections you may have so this is going to be all for this video my name is Felipe I'm a computer vision engineer and in this channel I make tutorials going tutorials which are exactly like the one you just watched exactly like this video so if these are the videos and if these are the things you're into I invite you to subscribe to my channel this is going to be all for today and see you on the next video [Music]
Info
Channel: Computer vision engineer
Views: 70,825
Rating: undefined out of 5
Keywords:
Id: jIRRuGN0j5E
Channel Id: undefined
Length: 34min 33sec (2073 seconds)
Published: Mon Feb 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.