Track & Count Objects using YOLOv8 ByteTrack & Supervision

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I love playing around with new models and we've been playing a lot with latest YOLO V8 on this channel training both object detection and instant segmentation model however the moment that it gets super exciting for me is when we put those models to the test and build something useful with them and today we'll build object detection tracking and counting system it may seem hard but with the right tooling is like connecting small Lego blocks so today we'll use latest YOLO V8 for object detection state of the art byte track for tracking and latest library from roboflow called supervision for counting foreign we'll go through two use cases the first one very stereotypical is Counting Cars Moving on the street the second one a lot less stereotypical but very much real is counting objects moving on a conveyor and the coolest thing about it is that we will use exactly the same code in both cases so there is a very high chance that you would be able to straight away reuse it in your project so before we start make sure to hit the like And subscribe button it helps the channel a lot and helps me to stay motivated which is very important in Long projects like this so without further Ado let me show you how to detect track and count using computer vision one more thing that video is very long most likely the longest I ever recorded so don't hesitate to use the timestamps below to navigate to the part of the video that is most interesting to you like with all our tutorials we start at the roboflow notebooks repository and we scroll this time a little bit lower to not into models tutorials but in two computer version skills and over here there is a new notebook track and count vehicles with yellow V8 by track and supervision so let's open that and go color as usual we created a notebook but I decided I will not use it you can still follow it in the meantime I decided the biggest value that I can give you is not to go through the notebook is to show you how you can go from I have the video into I have a working computer version system that can detect tracked and count objects so that's why I will create a copy of this notebook remove everything that is not installation of libraries or downloading the data and we'll see each other in just a bit okay like I said the copy is created let me just start by increasing the font size so you would have easier time following the tutorial and we are good to go let me just scroll through that introduction of maybe the most important part that is worth mentioning is that yellow V8 still under heavy construction so we need to be cautious it's better to have a specific version of the library pinned because your code that you are currently developing can break on any moment so let's start by executing Nvidia SMI we just want to confirm that we have access to GPU that was computer vision models work significantly faster if you execute them on a GPU instead of a CPU takes a little bit of time because the environment needs to be set up but we have a Tesla T4 as here next thing is to create home variable just to make it a little bit easier for us to manage paths and always know where we are with time I get to appreciate that working pattern so I can only encourage you to also implement the next stage is to download the video that we'll be using so this is the video of Cars Moving on the street nothing very unusual you saw it in the intro already and we'll use that as an example and now the installation of yellow V8 we have separate videos on yellow V8 when I go very deeply into installation and the CLI and SDK you most likely see that video in the top right corner right now you can also find it in the description below like I said it's good practice to have a specific version of yellow V8 pinned because they are breaking API on a daily basis right now I will try to make that notebook work with the latest version regardless of the moment when you are watching that video but for your sake it's much better to pin it so that you wouldn't need to fix that later on on your own next thing is installation of buy track contrary to Yellow V8 which right now has pip package and SDK buy track is packaged terribly so we need to go through a lot more steps over here you see we use version 0.1.0 by tracker the last part is supervision my little baby so be gentle I'm just joking if you see any problems something doesn't work for you for your specific use case make sure sure to create an issue on supervision repository same for notebooks if you see that something doesn't work don't hesitate to create an issue and we'll try to help you out and solve the problems it seems that we were using test Pi Pi let's change it to a regular one so just remove this and like I said it's always better to pin the version especially with supervision we are still in pre-release stage the API may change dramatically by the time that you will watch that video but if you will stick to the version 0.1.0 you most likely won't face any problems because it's already deployed in peep package manager so let's just hit enter it works I was in a stress for a second honest mistake guys honest mistake and now we can load pre-trained model just to confirm that everything works okay and we have everything ready the first thing that we can do without any effort basically is just to run yellow V8 CLI on that video that will give us the feeling whether or not our model is strong enough so let's do that let me just copy and paste the line with example prediction but what we'll be doing is we'll be using cast detect mode predict version X of the model and we'll pass our video as the source so we created a temporal variable over here that we can right now use let's do it like this presenter now when that is done we can go to runs detect predict and we have the result video that we can download jupyter notebook is notoriously terrible people are doing that fast so I will just use the magic of Cinema let's take a look at the results detections are looking pretty stable we get some random here and there but nothing that we wouldn't be able to filter out so I'd say we are good to continue CLI is great when you want to do something typical but the moment that your use case become a little bit custom you pretty much need to migrate to python scripting so that's exactly what we are going to do we will use supervision to recreate the inference Loop which is by the way one of the most frequently asked questions over the last few months is how do I run a model on every frame of the video now I have pretty much one liner answer for you and then we will use yellow V8 model imported from python SDK to run inference inside that Loop in the meantime I will do my best to explain how supervision utilities work we don't yet have documentation so I decided to go one by one and explain how to connect everything together first thing first let's start with frame generator this is a piece of code that will allow us to read frames from video but one by one so instead of loading them at the same time into the memory and most likely running out of ram really fast we will just load one frame at a time supervision have a dedicated method that we can use to do that it's called get video frames generator so that method takes only one parameter and this is the source video path obviously we need to import that utility from supervision how do we use that you just do something like this for frame in generator and you can basically use that frame inside that for Loop however for now we will not use it this way we will just pick one frame and I will explain everything on a single frame and then we will put that inside that for Loop if we would like to pick just one frame first we need to create an iterator it's pretty easy we just call eater on the generator then we can pick the next frame at this point if I would print the type of this Frame you would see that it's a numpy array something that we would expect that's usually what we use to store images now if I would like to print that frame inside notebook we also wrote a small utility to do that this is show frame and notebook but everything is explained in pretty straightforward language so I hope you will manage to use that on your own obviously like in the previous case we need to import that particular utility inside the notebook so let's do that and at this point we are basically able to pick a single frame from the video and display it in the notebook next thing that we can do is to run the model for a second let's comment this out let's add results model pass the frame and that will return a list of predictions because you can pass a list of frames or a numpy array of frames because we are passing single frame we will retrieve the first element from that list a result object that is coming from YOLO V8 pip package I can now do boxes and for example x y x y and that will return python tensor containing the locations of objects on the frame to work with supervision we will need to convert that into numpy arrays we also have confidence we see values between 0 and 1 or classes supervision is a general purpose Library so we need to convert what we get from yellow V8 to an object that is understandable by supervision and that object is detections so let me just copy the boilerplate so that we wouldn't waste a lot of time and because I already picked the first element from results I don't need to do this box annotator is another class and this class can be used to display bounding boxes on frame we already imported detections and bounding box annotator so let's now create an instance of bounding box annotator that instance of this important takes color palette something that we also need to import color palette is basically a set of colors that will be used by that particular annotator and at this point the only thing that we need is to annotate frame with those detections we pass raw frame Pass detections Plus labels but I will skip that for a second and right now if I will uncomment show and just run that we will get annotated frame with bounding boxes and in this case probabilities but I may for example don't like those probabilities I would like to have something more custom what can I do so we could for example print class ID no problem how can we do that we can pass additional parameter to annotator called labels it was here before but I removed it if for example I would like to just print class ID and confidence I could do that I can just do something like this and the annotations have been updated right now I see class ID of mic detection and its probability that's not really useful because humans work much better with text than with numbers models in yellow V8 come contain property called names and you can map our class IDs into those class names so let's just create a constant that will start that addict for us for a second and right now we can map our labels differently and basically instead of printing class ID we'll map that class ID into class name and I can print that and in just a second I should see that it's a track with 93 percent of probability so this is the example for a single frame the next step is to convert that into processing the whole video and then adding additional layers of complexity with tracker and counter now like I said if we want to convert code that works for a single frame and run it on a full video we need to Loop over video frames coming from generator instead of just picking a single frame with next we will use for Loop so we can do it the easy way like this for frame in generator and just indenting everything by single tap and that code would basically print every frame for us like it's doing for that single frame below what we would like to do is to save all those frames in the form of the video and that is possible with another class that we have in supervision library that is called video sync let me just copy that into the notebook and now if I indent it one more time video sync expect to arguments the path to Output file and the instance of video info class if I will just paste that single line and also import now I should be able to get a little bit more information about the source video that we were using I know for example the resolution the FPS and the total amount of frames and that information can be passed to video sync so that it can and create a output video file with the same video resolution and the same FPS right now we should save that video info to a separate variable and that code is pretty much ready to run the model on every frame and save the results ah the only thing that we still need to do is to remove show frame in Notebook because we don't want to print those frames we want to save them so video Sync has a method to do that it's called Write frame with that small change we are basically ready to process the video there is also one more thing that we can do easy way to Loop over frame is just with for Loop but when I'm processing long video files I would like to know my progress we could use tqdm which is python package that creates those very nice looking loaders tqdm requires something that it can iterate on and the amount of total items that we are going to iterate over because I already created video in for instance I know the total amount of frames in my video right now I just need to import tqdm and I should be good to go I just remember to import tqdm notebook version because there are different versions for terminal and for Notebook we can now press shift enter and just wait for the video to get processed so far so good now that we have that part done we can think about plugging in the tracker in the past we used byte track with yellow V5 to track football players on the field so if you want to learn more feel free to watch that video the card should be visible in top right corner right now but today I will just copy the boilerplate plug in the tracker into our Pipeline and focus mostly on Counting okay so we are back in our jupyter notebook at this point our main objective is to plug in the tracker the tracker itself is already installed in the environment so there is nothing really stopping us from creating a instance of that tracker so let's do that maybe a little bit over the generator the tracker takes arguments that are in the form of python class that class was defined just a cell below the installation code and as I said in the yellow V5 video there is a small problem with matching predictions coming from yellow model and the expected positions of objects we've given IDs that are coming from byte tracker so there is a little bit of boilerplate code that we need to copy and paste so once again if you want to understand more from that section I highly encourage you to watch that video for now I'll just paste it over here and forget that it existed and we will just use it inside our main for loop as I said we created an instance of byte tracker just over our main for Loop and here I'm using that instance to acquire tracker IDs all that seems pretty complicated but it's really not it's just something that needs to be there so don't get overly intimidated by that and finally we can update our labels generation include not only class ID not only confidence but also Tracker ID let's run that code and take a look at the results slowly but surely we are getting somewhere our previous output just got and reached by Tracker ID and the only thing is left is to draw the line and count objects that are crossing it that part is actually pretty easy the only thing that we really need to think about is the location of the line so I spent a little bit of time pinpointing the right location and I came up with those two points let me just maybe paste them over here and we are using class coming from supervision Library called point so I need to import it so that we wouldn't get any errors because of that press shift enter and we should be good to go those two points that are basically the start and end points of the line are arguments that are used in The Constructor of line counter so we can see I'm using line start and line and and one more thing that we need to do is a way to display that line and for that we need another annotator so you remember we had this bounding box annotator that we used to draw detections and right now we have another annotator that we will use to draw the line when that is done we need to let the line know where are the detections so that it can determine whether or not it got crossed by them and at the very very end of our main Loop we'll just annotate that line so that we would be aware about the current counters values that's it I guess and we obviously need to import those two important classes that's it now we can just do shift enter and wait for the final results a funny fact about the whole experience is that we spend most time not on tracking on counting but on recreating the inference Loop that just shows you how many use useful functionalities are buried deep inside those libraries like yellow V8 not sure about you but I'm craving for something sweet and that's perfect because in our Second Use case we will be tracking and Counting chocolate candies moving on a conveyor unfortunately this time we will not be able to get away with model pre-trained on Cocoa data set because there is no class candy in it before we can even think about building our tracking and Counting pipeline we need to train our YOLO V8 model on custom data set I extracted few dozen frames from videos that I had and throw them into roboflow The annotation was pretty straightforward candies are located pretty far away from each other so bounding boxes are perfect for that use case I tried to minimize the amount of manual work that I needed to do so I decided to use augmentation to artificially increase the amount of images in my data set I didn't use anything super crazy just the regular stuff horizontal flip a little bit of rotation and shear all in all I ended up with around 60 images in my data set not a lot but given the fact that it's a very controlled environment I was hoping for a positive result now that I have the data set I can jump into Google collab and train the model we will use the same jupyter notebook that we used for yellow V8 object detection tutorial let's open it in Google collab and straight away create a copy on our Google Drive the new notebook will open in a separate tab we can close the previous one make sure that the runtime is GPU accelerated it is we will quickly go through the installation process for Ultra electric yellow V8 we went through it in our previous use case so there is nothing really new happening over here let's keep the CLI in France that staff was already covered in the yellow V8 object detection video and we can go straight away into took training using custom data set I can go into my roboflow profile sign in go into my chocolate candy data set get the snippet that I can copy into my Google call app paste it here and the data set will download into my environment and I will be able to use it for custom Training now the only thing that is left is to run the training and I will pick 25 epochs that should be just enough for such a small data set shift enter and let's wait for the model to train that was really fast training just like three seconds per Epoch or something like that so you can now examine the results just confirm that our model is behaving correctly it doesn't need to be perfect just good enough given the fact like I said it's very controlled environment I don't see a lot of opportunity for false detection and yeah in my opinion that's more than enough for us to use it for tracking so in that that case the only thing that is still left to do is to zip our runs directory so let's do it right now Great Click download and move to Second repeater notebook where we will use those weights to detect track and count those candies for the first time in this video we are back at roboflow notebooks this time we are scrolling once again to computer vision skills and open track and count vehicles with yellow V8 and byte track the same notebook that was mentioning at the beginning of the video two things that we need to do at the very beginning first of all we need to upload our ZIP with waste second of all we need to upload the example video that we would like to use for inference here is the video and here are the weights the upload may take a little bit of time nothing unusual for gold co-op in the meantime we will Super quickly go through the in installation process that part on the other hand was covered in this video already so there's no point in going through that once again I will just update Source video path to can these dot MP4 and I can continue with installing your V8 by track and supervision all required libraries are installed and the weights just finished uploading so we can now unzip them let's do that with unzip weights zip and now we can just switch the path from yellow V8 x 2 runs the tag train weights past PT let's do it like this and now allow the model to confirm that everything works properly here we need to update class ID there's only one class candy so let's do it like this that's the only one that we are interested in she enter shift enter and that should give us the inference result for a single frame and if everything will be fine we'll just run processing for the whole video that looks pretty promising I'd say so in that case ah before we can run the processing we need to think about the line positioning so I'm thinking something diagonal like this so we will start maybe at 50 50 and finish on the same in the same place just on the opposite corner so it will be 38 40 minus 50 2160 minus 50 that should give us quite a nice location of line and the output file should be named I don't know Candy result shift enter and let's execute the processing for the whole video now we can use the magical Cinema and just examine in the results counting works perfectly it's just a shame that the video that we decided to use is so short and we don't have the opportunity to take a look for a longer time how it performs still I would count that as a success you can clearly see that we can switch the use case and the model as well as the whole pipeline behave flawlessly that's all for today I hope that the tracking and Counting using yellow V8 is much less terrifying right now make sure to like And subscribe and stay tuned for more computer vision content coming to this channel soon my name was or should I say is Peter and I see you guys next time bye
Info
Channel: Roboflow
Views: 83,256
Rating: undefined out of 5
Keywords: yolo, tracking, multiple object tracking, multitarget tracking, traffic control, yolov8, bytetrack, deepsort, yolo tracking, yolo v8, object tracking, sort, yolo object tracking, yolov8 object tracking, yolo v8 object tracking, python real time object tracking, video object tracking, deep sort, vehicle tracking, conveyor tracking, computer vision tutorial, computer vision
Id: OS5qI9YBkfk
Channel Id: undefined
Length: 26min 11sec (1571 seconds)
Published: Fri Jan 20 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.