YOLOv8 Object Counting in Real-time with Webcam, OpenCV and Supervision

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
one of the main reasons original YOLO algorithm took the World by storm in 2015 was its ability to run object detection in real time today I will use the latest iteration of that algorithm yellow V8 to build and run simple object counting application on my webcam stream this video is actually part 3 of the series where we use peep packet supervision to build different kinds of video analytics applications so if you are interested in the rest of those videos click the link in the description or in the card in the top right corner now without further Ado let's get our hands dirty okay so we start an empty directory and the first thing we'll do is create python virtual environment we activate it and after that we installed the first dependency we need ultralytics it is a package that contains yellow V8 model internally it installs heavy dependencies like Pi torch so the whole install isolation process might take a while [Music] so far so good let's try to use YOLO CLI to test whether or not the installation finished properly we see that the model weights are being downloaded and after a few seconds we see live video stream with bounding boxes the performance of the model is really good without that large version of the model and we still get inference after just a little bit over 10 milliseconds the one thing that is a bit weird is the oscillation of the bounding boxes but we'll try to fix that later now let's try to recreate the same effect without CLI but using python SDK CLI is perfect when you just want to make something quickly and simply however the moment that you need to do something custom like in our case build a small application to count objects within the zone you need to use SDK before we start make sure to like And subscribe hit the Bell button make sure that you will know first about tutorials like this in the future first thing first we'll create empty python file create main function we'll pass for now and we'll run that function if we'll run the whole file as a standalone script now let's add simple hello to the main function and run it in the terminal just to confirm that everything works as expected and sure it is now let's access the webcam the first thing we'll do is import CV2 now in main function we will create a video capture with a video device index 0. so if you go to the terminal and LS everything in Dev directory and grab by video you will see that there are multiple video devices we will access the one with zero index so let's create while loop and read the frame inside that particular Loop and just to test will IM show the current frame and let's also allow for exiting the while loop so if I will press Escape the while loop will break cool there are two integers we use in this line the first one 30 is the amount of milliseconds that opencv will wait for us until we will hit any key on the keyboard and 27 which is the integer that represents escape button in ASCII table now I will run the script and just after a few seconds I can access the webcam you can see I can wave the hand over here we see it on the screen the one thing that is a little bit weird is the resolution of the video that webcam should have higher resolution okay so let's try to fix that the first thing I will do is I will comment the breaking mechanism I will print the shape of current frame and break regardless now if we run it in the terminal we'll see the current resolution and it is 640 by 400 that webcam should be able to run in higher resolution so let's try to make it configurable so I'll import Arc bars and create new function parse arguments and that function will take no arguments but it will return arcpar's name space and we will now create a new arc parser and name it yolo V8 live now we can add a first argument uh and let's call it a webcam resolution that will be a list of int values and it is a resolution so we will have only two values in the list and let's also create a default value for example 1280 by 720. now let's break the lines to make it a little bit more cleaner and readable and after that argument is being passed we will parse arcs and return them from the function now we can go to Main and call that function so that we can use that resolution now let's use a CV2 set to select the resolution of our webcam so we need to set width and height of the frame we will unpack our list right after we parse the arguments so we will get frame width and frame height from arcs webcam resolution and we'll pass those values in our set functions now if we'll run that script in the terminal we'll see that the frame resolution have been updated so we can uncomment our Loop breaking logic and we should be able to see the webcam stream exactly you see me waving over here now we are ready to plug in the yellow model since ago we used yellow CLI to run the model from the terminal and that command took a argument called model with value yellow V8 lpt and this is the name of Weights that were passed to the model so let's do the same but in the SDK so I'll start by importing Yola from ultralytics and then create a model instance and pass the same model weights name as an argument so yellow V8 lpt now inside the while loop I will infer on current frame so let's do results equals model on current frame and run it in terminal and after a few seconds I see current frame but I don't see any detections but when I take a look in the terminal I see that for the given frame we detected one cup one apple and one scissors which is just about the content that we would expect printing detected objects in terminal is cool but it would be even cooler if it would be able to draw bounding boxes on the frame so let's install supervision and annotate our video stream before we can use supervision utilities we need to install supervision as our next dependency and after that is done we can import supervision as SV in our script and create the instance of bounding box annotator we will pass thickness of the line thickness of the text and the text scale and inside the while loop below model inference will convert yellow V8 results into supervision detections and use them to annotate the current frame so we will call bounding box annotator annotate pass current frame as scene detections as detections and run the script in the terminal looks like we need to select the first element from returned list and run the script once again and after a few seconds we should see the first frame of the stream along with the bounding boxes and probabilities I guess it would be cool to map those detections into concrete classes detected by the model so let's do that bounding box annotator annotate method can take additional optional argument called labels to parse them we'll Loop over detections inside the list comprehension every entry is a tuple that contains confidence and class ID among other things now we'll use string formatting and map class ID into class name using the dict stored inside the model under names property like previously we'll also plot confidence rounded to the second digit after the comma now we can pass our past labels as additional argument and re-run the script in the terminal and after a few seconds we will hopefully see a frame yeah a frame from stream with bounding boxes and the class names cool now we can get a bit more creative and use our detections to build a bit more complicated video analytics system we start by defining a Zone polygon in the form of numpy array so in our case it will be a rectangle that will occupy the left side of the screen we actually have a separate tutorial showing you how to define the geometry of the polygon you can find it in the description and in the tab in the top right corner right after while loop we will define an instance of polygon Zone we will use our geometry as the polygon argument and we will pass also a frame resolution we are in lag because we get that resolution as one of the arguments from Arc parser on top of that we create an instance of Zone annotator this is the class that we use specifically to draw zones on the frame so to create that instance we need to pass the instance of zone define the color that we would like to use for annotation and now inside the while loop we can first of all trigger the zone and second of all annotate the frame with our Zone so we passed the frame as the scene and we are more or less ready to test our solution foreign s the model gets loaded and we see the view from the camera it looks like we messed up something in a Zone definition it occupies almost the whole screen instead of the health but the counting works properly so let's go back to the editor and fix the Zone yeah obviously we used incorrect values for the width um yeah sorry only for the width and we can now rerun the script like always we need to wait a little bit for the model to load yeah and right now uh the Zone only occupies the half of the screen there are only scissors in the zone now it's apple and scissors two objects cool okay now time for some improvements let's reformat polygons annotator a little bit and add the thickness of the line thickness of the text and text scale to make the counter a little bit larger because it was quite hard to see with all the things going on on the scene so let's rerun the script once again oh yeah it's it's a lot better we can clearly see the number right now the next thing that I would like to fix is the fact that the person class when I'm grabbing something is visible so let's filter person out we can do it easily with the pandas like syntax let's rerun once again and now what I'm grabbing stuff now the person class is not visible I can take all that stuff out I can put the apple and the person class is not interfering with the counter that's a lot cleaner than before great now I decided to mix things a little bit and bring new objects to the scene so let's take a look how our script is performing with apples and orientus so let's restart the script and bring few new actors to our scene okay so like I said orange Apple so far so good uh a little bit of weird Behavior when the apples are close to each other now we are getting a little bit of weird stuff the Orange is oscillating between orange and apple class what we saw there was actually not oscillation between two different classes it was double detection that means that single object created two bounding boxes with two different classes we can prevent that from happening by using Global nms non-max suppression insured that means that whenever we would get double detection and the intersection over Union of two bounding boxes would be sufficiently high enough even when those two bounding boxes are from two different classes we will get rid of one of them the one with lower confidence we can add additional argument to our inference logic this one is called agnostic nms and we set it to true and when we rerun the script and the model is loaded into memory we see that we no longer experience double detection Although our orange is mostly detected as Apple but from the counter perspective it's actually much better because we see the correct number of items in the zone we are almost done there's one more thing that I would like to change so you most likely notice that we pretty much hard code that definition of our polygon Zone and it works pretty fine when we use the same frame resolution but if we would pass different frame resolution the size of the Zone would be for example significantly larger than the whole frame so I'd say it would be pretty cool to go from hard-coded definition to a relative definition and recalculate the size of the Zone after we start the script first thing first let's test our hyper hypothesis so we can use our webcam resolution argument to pass different values for example 640 by 360 or run the script load the model and we see that the zone is basically only half visible and it occupies the whole frame we see only portion of the counter so let's fix that like I said we can remove the hard-coded values and introduce relative ones I want my zone to have the half of the width of the frame and the full height of the frame consequentially we need to recalculate the real size of the polygon so we pick our relative dimensions and multiply them by the frame resolution convert that into numpy array and all of that needs to be casted to int now we can take that value and inject it into polygon Zone okay let's run the script with the smaller resolution and it works beautifully so right now our zone is recalculated to feed the resolution of the frame I want to do one more experiment we just manipulate our script to only count apples in the zone so basically discard objects from any other class it should be pretty easy to do that there's only one small change that we need to do in our filtering logic so instead of discarding people we need to discard an object that is not class 47 which is Apple so when we run the script that's exactly what is happening we only detect Apple so despite of the fact that we have oranges in our Zone we only count Apple objects so right now we don't have any Apple in the zone but we have two oranges right now there is nothing right now there is a single Apple into apples and three and so on boy that turned out to be pretty long video I have a lot of respect for you if you are still here I hope that you learned something today as I did uh just playing with the live video stream and tweaking the script to add one more functionality and I just couldn't stop myself from doing one more and one more still I hope that all those people who are asking about live webcam tutorial for yellow V8 are satisfied if you find that video useful make sure to like And subscribe and stay tuned for more computer vision content coming to this channel soon my name is Peter bye
Info
Channel: Roboflow
Views: 53,971
Rating: undefined out of 5
Keywords: yolov8, yolov8 neural network, yolov8 custom object detection, yolov8 object detection, yolov8 tutorial, object detection yolo, object detection pytorch, object detection python, opencv object detection, opencv yolov8, opencv python yolov8, object detector, object detection yolov8, opencv, detect objects with yolov8, yolov8 opencv, opencv neural networks, deploy yolov8 model, how to deploy yolov8, yolov8 counting
Id: QV85eYOb7gk
Channel Id: undefined
Length: 19min 38sec (1178 seconds)
Published: Tue Feb 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.