YOLOv10 Object Tracking on Live Webcam Step by Step Tutorial

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

do you want to learn how we can apply op tracking on top of the new YOLO V 10 model so just a few days ago YOLO 10 was just released and it is a pretty cool model it is significantly faster compared to all the other YOLO models in the YOLO family but again we're going to see how does it perform does it have the same accuracy but this model is very fast so we're going to see how we can apply op tracking on top of that model where we're both going to take a look at strong sword but also bite tracker all the code will go through it line by line also how you can set up the YOLO V 10 model run inference extract all the results so you can use it in your own applications and projects and then at the end we're going to run optic tracking with yolo V 10 on a live webcam so let's just go inside the YOLO V1 GitHub repository we can see all the code here but the most important thing is these comparisons with the other YOLO models and optic detection models out there so you can see in red here we have YOLO V10 it is significantly faster compared to all the other models is around 2 milliseconds in latency so that means it takes 2 milliseconds to process a single image we can also see the Coco average Precision up here on the y- axis in percentage where it also does a pretty good job and basically outperforms all the other YOLO models so we're going to see how we can use this model here P install it and all of that and also download the weight you can just press on these ones here and they will download the weights to your computer we can track them into our weight folder later on but the only thing that you have to do is basically just P install this GitHub repository and we are good to go so let's just jump straight into our code editor where we have all the code both for the tracking and also the YOLO V 10 model first all here I'm just going to activate my YOLO environment because this is not active with ultr ltic yet when I'm making this video might be at a later point you can just use the model directly from ultral litic but right now we have need to activate the environment and P install it from the Yol 10 GitHub repository so here cona activate and then I have an environment called yellow there we go and then we can just paste in this code here so then we need to go in and PIV install it so PIV install Das Q get plus and then we just have the URL here for the GitHub repository we you ready to go I just hit enter and it's going to pip install Yol 10 on your local environment so after that is done we pretty much have everything make sure that you have all the other dependencies like Pi torch open CV and so on you can p install that directly as well so right now let's just minimize this this terminal and let's go up and take a look at the code I just have a single script that does everything for us both up detection it's going to create a model both for update detection but also model for tracking and then it's going to put that together it does update detection take the result throw it into the tracker so the difference between op tracking and optic detection is that optic detection we detect on each individual frame where if you're using tracking we can actually track the same objects over a number of frames so that is act like tracking over time so first of all here we import the different modules so import torch opcv utils ALR litics and so on and also supervision for visualizing art detections and then I have a repo here which is called bite track and also strong sword I have a course teaching all the theory behind optic trackers how B track is implemented strong sword we dive into the code line by line and see how can you act like Implement optic trackers from scratch we go over the calman filter all the fury and so on behind that and then we set up the whole system and pipeline so all the code will be available in there so definitely check that out and also if you want to learn how to implement these trackers from scratch so right now let's go down we can then just set a flag here which of The Trackers we want to use if we want to use strong sword or bite track where strong sword it is a bit better when we want to do reidentification which means that once we lose track of our objects it can actually like re-identify that object and assign the old id to that object again so it use a NE Network that looks at the visual features and BAS just matches that with previous ones so it is a bit slower when you're using strong sword compared to B track where B track is probably the fastest Tracker out there and there's also a bunch of different variations of strong sword we have sword strong sword deep sword and so on but these are very good trackers so right now we just have our optic detection class could also be op tracking we do an initialization where we specify the capture index so right now we're going to have this working on a webcam any USB camera that you have attached connected to your computer but it could also just be a video file that you throw in first of all here we just check if Cuda is available on our computer or else we're just going to use the CPU right now I'm on my MacBook so we're going to use the CPU but if we have an Nvidia computer then we can use Cuda and it will be significantly faster I've been running this on a 4090 and so on and we're running like 100 frames per seconds with both optic detection and also tracking and also to be clear this new Yol 10 model is significantly faster compared to all the other models because it basically just removes the non- maximum supression in the postprocessing for optic detection then I have a function here for loading the model we're going to take a look at that in just a second we extracted class names here from our model as well and we set up our box annotator with supervision so right now we can just set up our reification weights as well so I have a weight folder over to the left where I have all of them I just got the weights for Yol 10 from the GitHub repository but they will be available with allytics as well so it will download it automatically once you set up the code so we have a Yol V 10 model here and then we also have this o OS net which is for reification for a strong sword tracker so that is in here and then we have our tracker which just check which of The Trackers we have chosen and then we set up the configuration based on that inside our trackers we have B track and we have strong sword we set up the different threshold so the the threshold for our tracking matching and also our track buffer so how many tracks do we act like want to buffer when we do our tracking over a number of frames and also the frame rate for strong sword we have some other different parameters and so on that we need to specify again I go into details and explain all these parameters in my Yol V8 tracking course so we have the max distance between the different detections that we have before we just discard or remove that track we have Max intersection over Union distance so how much is the bounding box and so on moving from frame to frame max age how long do we we want to keep each ID and track inside of all our tracks before we discard it or like basically just terminate our track so let's say that we have 10 detections then we miss five detections and then if we have a max age of five we would like like remove that track from our tracker so we're not tracking that object any longer we have Max unmatch and in it here so it's basically just like how many frames do we want to detect our optic before we initialize a track and ass sign an ID we have some budget and so on for data Association we have a load model function we just set up Yolo from alter litics we specify the weights so this is the YOLO V10 Nano model we fuse it and then we just return our model now we can go down and call a predict so we just take in our frame we have our model we take our frame throw it through the model and we get the results out and then these results here they can be extracted and thrown into our object tracking algorithm we also have a function here just for drawing the results and this is the function that extracts all the results and also applies the tracking but you can use this function it can be for drawing it can be for like triggering different types of systems so this is the output from your update detction and tracking project if we take a look at it I just have some list here to extract all the relevant information we have a for Loop running through all the results so this is all the detections that we have from our object detection model which is Yolo V10 to start with here we extract the class ID we print the results for the bounding boxes just so we can take a look at it what does this act contain these result and the bounding boxes here or like the boxes is for optic detection we check if the class ID here is equal to zero it means that we have no detections or like the class ID is zero then we basically just continue if it's greater than we're basically just going to set the class ID equal to the first one but these things here doesn't really matter too much it was just something that was playing around with you could also check for specific classes if you only want to detect a person you can check if the class ID is equal to zero and if it's a person you can basically just run through all the code and if it's not you can just continue and it's just going to take the next prediction so this is how you can filter out different classes if you're using the Coco classes right now we're just going to extract all the results so pront results we can extract boxes XY position so this is the top left corner and the bottom right corner we have our CPU converted to nony so we just have nony arrays we pended to the list that we have up at the top so these are each individual prediction that we basically just extract and then we put put it up into our other list outside of our for Loop then we create these detections from supervision so we just specify our XY confidence and also class IDs so we can use this to visualize them later on then we're going to extract labels so first of all we're going to set up some strings so we have our class ID and we also have our confidence once we're going to annotate our frame then we can annotate it with supervision so self. boox annotator do annotate we specify the frame that we want to annotate on the detections we want to annotate and also the labels that we want to put on top of our detections now we can then return our frame and our bounding boxes and visualize them now we have a call method here which is basically like the main function inside of our class we open up our window capture with our capture index which could be a video file you can just specify directly here and it's going to run a video but right now I'm going to run it on a webcam we assert that this webcam here or like this capture is opened if it's not it's just going to terminate the program if the flag here save video is set we're going to save the results so all the results with all the annotations we're just going to write that to a file so we can play back that later send it to whoever and what else we want to do with that video we set our tracker and then we check if our tracker has this attribute model and also warm up and then we're going to warm up our tracker model before we act like run this is necessary for some AI models we have our outputs current frame previous frame and then we have our Wild true so this is basically just going to load in each individual frame from a video capture either if it's a video file or the webcam then we're going to call this predict function so we have our model we call predict this is basically just taking the frame throwing it through the model we get the results out then we take our results throw it into our draw results function which just extracts all the results does The annotation and Returns the frame so then we can use this Frame here to go down and update our tracker together with our results so right now down here again we can go through all the results so this up here was basically just for optic detection now we have another for Loop running through all the results for our tracker so right now we have our tracker initialized right up here we have a camera update with the previous frames and also the current frames because when we're doing optic tracking we track Optics over number of frames so over time then we can call this update method on top of our tracker so it's basically just an update and predict state which is the same as the calman filter all of that is within my HS result frame we extract the output so this is the outputs that we can enumerate or have another fall Loop running over and then we can extract outputs with the bounding boxes track IDs and the top left corner so we can visualize that and then once we call this instance of our optic detection class it's just going to call the call method inside of the class and it's basically up and running with our while loop so this is pretty much it we don't have to do anything else we have our weights over to the left here we also have the reification networks we we have our strong sword algorithm we have our bite tracker if we just shortly go in and take a look at the code I won't dive into details with this because it will basically be like a whole course I have like hours of videos going through all of this and it will just be boring throughout this video so right now this is basically like everything that is in there so again we have the calman filter all the different tracking algorithms are based on the calman filter we set up our base tracker here again it is basically just taking the bounding boxes then it does a prediction state it takes the next frame so it always uses the previous frames and the current frame and then it tries to do a prediction of where that bounding box or object is at now and then it updates it based on the actual prediction that we get from our YOLO B1 model we can both do multi prediction single prediction then we get do some activation and then we can basically just put the track on D on top of our visualizations from Super Vision so this is pretty much it all up here at the top we did our optic detection now we do our tracking where we are just interested in extracting the bounding box and also the track ID so we can put that on top of our annotations that's pretty much the whole pipeline we don't have to do any more else we have like 150 lines of code we have this class here everything will be available inside my course and so on I'll probably be able to put this one up here for my GitHub just to get the whole Cod structure for the class if you go down at the bottom we then just put the text out with FPS so we calculate how many frames per second is this running this is is running on a CPU in my case so we're only going to get like 810 frames per seconds with the Yol 10 model and the by tracking algorithm we show our frame here so the frame will both contain the supervision annotations and also our tracked IDs if you save the video we're going to write out the frame if we hit Q or like escape on a keyboard at any time we're going to terminate the program and basically just go out of this while loop if we have saved video we release our video so we act like make sure that we save it now we can create an instance of our update detection model we specifi the capture index but this could also be a video file we have our detector and then we basically just have a lot of conversions back and forth between the bounding boxes top left corner width height left top left corner bottom right corner and so on so all that boring stuff there's a lot to keep track of but also just the whole data structure that keeps track of the bounding box class ID track ID confidence score and so on so it also has a matching algorithm base track if we take a look at strong sword it is very similar so all of them is pretty much building on top of each other but now we also have this reification model so it uses Neil networks to do matching and reification and tracking so this is pretty cool we initialize the tracker here all the code is in here so it's basically like a whole repo with a bunch of different files as you can see here like there's just tons of different files again we have the calman filter linear assignment NE Network matching and so on you can also just see the track here so this is basically just where our data structures will be Sav so there's tons of different code in here like I'm going through all of that inside my course it will take hours to go through here so right now I'm just showing you the base class with object tracking Yol V 10 and bite tracker so now we're ready to run it we don't have to do anything else than running this python script so right now I just call PIP yellow tracking and we should hit enter and we should be good to go so right now it should open up the video stream in just a second we're running on CPU and there we go we have me up in the frame we have id1 up here at the top assigned to me right now we're only taking a look at that so we have a person we have confidence score I can move back and forth and now we keep track of me so instead of just doing predictions on each individual frame we miss our detection it is going to jump back and forth and so on we can also play around with the confidence score but now we assigned an ID we can now both do object tracking we can do counting so let's say that we have a conveyor Bel where objects are actually moving on that conveyor build we can then do counting how many objects are passing we can do image analytics traffic analytics pretty much just everything that you can imagine most computer vision applications when they work in real time we want to apply object tracking on top of the detections basically just so we can get our ID here and don't get double detections and all of that stuff around 10 frames per seconds here running on the Macbook so that's pretty good with tracking you can see it's pretty responsive as well so this could actually work just out of the box on a CPU in real world applications and projects look pretty good down here at the bottom we can see all the extractions so basically just XY values confidence scores and so on that we're printing I just try some other objects here so here we have a cell phone we have id6 can move it around we keep track of it so it's still id6 it actually like did a reification there so that you saw that so it just missed the tection let's go back again id6 id6 so now here it act like swap probably because the bounding box was act like just moving too much in the frame so that is also one of the parameters as I mentioned so if the bounding box from one frame to the other is moving too much we basically just delete that track as well but now we can see we can actually move it around pretty fast and it still keeps track of course if we have more frames per seconds it will be better and so on but this looks pretty good this is how we can run optic tracking on a live webcam few lines of cod there not really too much we have the whole class structure specified the tracking model optic detection model now we even have the whole Cod base up and running you can just take y V8 y v9 y v 10 and arbitrary uptic detection model we just need to make sure that the output detections are in the correct format and then we're good to go you can apply this in your own computer vision applications and projects so thank you guys for watching this video here I hope you have learned a ton definitely try this out on your own the new YOLO V 10 model the the new member in the YOLO family it is very good it is very fast it can run significantly faster compared to the other models from the test that I've done I'm also going to create videos where I do comparisons between all the models I also have a video where we do custom optic detection so how can you basically take your data set I label the data set I generate images from python I open up the webcam generate images we label them put them into a Google cab notebook train it export the model and run live inference with a custom model so videos for all of that definitely check it out and then I just hope to see you guys in one of the upcoming videos until then Happy learning so we also have an AI career program if you want to learn how to land AI jobs and get AI freelance work I teach you everything in there we have programs all my technical courses weekly live calls personal help and I would love to have you guys in there help you out in any possible way you can check out the program down description and the community and then I'll just see you guys in there

Info

Channel: Nicolai Nielsen

Views: 6,231

Rating: undefined out of 5

Keywords: object tracking, deep learning, computer vision, yolov10 live webcam, yolo v10, YOLOv10, Yolov10 obejct detection, yolo v10 object tracking, Strongsort, ByteTrack, Tracking Yolov10 and strongsort, Object Tracking Yolov10, Obejct Tracking Yolov10 and Bytetrack, Obejct traking with Bytetrack, Yolov9, Yolov8, YOLO object tracking, computer vision python, yolo v10 object detection, yolov8 object detection, bytetrack algorithm, Obejct tracking live webcam

Id: Zf3tqXacN1E

Channel Id: undefined

Length: 18min 58sec (1138 seconds)

Published: Sun Jun 02 2024