Count People in Zone | Using YOLOv5, YOLOv8, and Detectron2 | Computer Vision

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello long time no see but I was super busy uh my last video on roboflow channel was over two weeks ago and back then I showed you how to detect track and count objects moving through the line the overall reception of that video is super positive so thank you very much for that by the way you can watch that video the link is in the description below and in a tap in top right corner anyway that motivated me to dabble down on supervision and add even more useful functionalities to the library and today I will show you what I've been working on for the past two weeks overall the topic of today's video is detecting and Counting objects in the zone another very popular use case in computer vision and I heard you guys screaming in the comments under the last video you want to have more models so today I will show you how to use yellow V5 yellow V8 and The tektron Mask are CNN with supervision before we start make sure to like And subscribe I saw the stats I know that 75 of you who watched the previous video were not subscribers so I don't know if you watched the previous video and like it and you watch the current video and like it maybe it's worth just subscribe and be up to date with whatever we are doing over here just a suggestion anyway without further Ado let's just jump into the code you most likely already know this but we start at River flow notebooks repository which is basically the place where we store all Jupiter notebooks that we use in our YouTube tutorials and this time we scroll into computer vision skills section and select the second from the top detect and count objects in polygon Zone and you can open that conveniently in Google call app and before we start let's just create a copy and save it on our G Drive takes a little bit of time but yes we can open the copy in the new tab close the original notebook and there is also one more thing we need to confirm that we have the access to GPU in our alarm time cool we are basically ready to go first thing first we need to set up our working environment in our case it's about two things installing all the libraries and downloading video files that we will need in our demo exam samples so let me just go ahead and do that just to have a clear conscience I will run Nvidia SMI we already went into the settings so we could expect that we have access to the GPU but I just decided to do it second of all we can now print information about our environment that is very handy if you would like to submit a back report just copy and paste that into the bug report it's much easier for me to debug any problems that you face and we can basically go straight into the installation part like I said in the intro we'll be using three different libraries yellow V5 yellow V8 and the Techtron 2 and I will be using models pre-trained on a Coco data set because that's fine for my use case however if your use case is quite specific and it would require retraining the model on a specific data set I will just tell you that we already have tutorials that shows you how you can do it for each of those models so I will just link them in the description okay enough of the talking back into the code in case of yellow V5 we just need to pull their repository to our local drive and install the requirements txt that are inside that should be pretty fast yellow V8 is even simpler because it's already distributed via peep so we just install ultralytic speed package and the installation should be done in no time those installations are super fast for Google collab environment because most of the dependencies that are required for yellow V5 and yellow V8 are already installed in Google collab when you would install that locally most likely it would take a little bit longer detect run to installation the longest of them all because there is a lot of things that need to happen so I will just use the magic of Cinema perfect now we just need to install supervision I just want to remind you that we are still in the beta I guess the official release will come somewhere in March so up until that point make sure to freeze the version of supervision that you are using because we may introduce breaking changes and for this tutorial the version that we should aim for is 0.2.0 and and the last point on our to-do list which is downloading the video examples those videos are hosted on my G drive so we just need to go through those cells and download those files nothing really interesting happens here finally installing those libraries always takes a little bit of time but we are ready to start with the first example which will be detecting of the customer in the shopping mall alley so I decided to draw a Zone on the floor and whenever a customer would enter the alley the counter would go from zero to one spoiler alert it didn't really worked out as I expected we start with yellow V8 example we'll use the SDK that comes with the package that cell basically Imports Yolo from ultralytics package and then load selected weights into the memory in my case a small small version of the model of course yellow V8 comes with the CLI that is very handy if you just want to run the model on a specific image or video however the moment you would like to do something more custom like for example detecting how many detections there are in a specific Zone you need to build your own python script and you need to use SDK and that means that you need to recreate the whole inference Pipeline on your own fortunately you can use supervision to speed up that process let me quickly go through the vanilla inference pipeline that I created for you I start by importing supervision as SV by the way I will reimport supervision for every code snippet but that's only because I wanted those code Snippets to be kind of like stand alone code examples that you can run separately you of course don't need to re-import supervision every time now I will use get video frames generator one of our videos utilities to return the first frame from the video you can use that utility to Loop over every frame from the video for the time being I just want to experiment on the first frame and by the way here is our documentation right now you can learn a lot more about the utilities we built for you every frame is a numpy array so we can inject it straight into the model and get our predictions now we need to convert those predictions into something that is understandable for supervision so we can conveniently use a from yellow V8 method that is part of the detection class and those detections are now usable within supervision framework we can for example annotate our frame with those detections for now we offer easy conversion from yellow V5 and yellow V8 results however today I will show you how to do it manually if you work with detectron 2. in general you can expect a lot more of those connectors to be added to the library in the future long term we will like to be compatible with every major computer vision framework out there if we take a look at our results for the first frame we'll immediately see that we detected a lot more than just a person so right now I would like to be able to filter out detections that are not related to person class now when we scroll a little bit lower there is another code snippet that I created and there are two key differences between the top one and the current one so the first difference is filtering we can filter by class we generally added a numpy pandaslike filtering to supervision so you can basically State your logical condition in my case I want detections class ID to be equal to zero this is class ID that is related to person in Coco data set and I just passed that logical condition in square bracket to detection and I will get new detections objects with only person class included and the second difference is in the code ZIP at above we only printed confidence and that wasn't really helpful so I wanted to add a custom label in my case it will be a class name and a confidence and here is the result of the second code snippet so far so good now we can move our Focus to creating a Zone on the floor so I would like to create a polygon Zone that would start here in the corner of the alley move through the whole length then across then once again through the whole length and finish here basically whenever a person would be in the alley we would detect where the bottom center of the bounding box is located and if that bottom center of the bounding box is inside the polygon that we created we would increase the counter so if we scroll a bit lower that's what I did in the next snippet I defined a polygon Zone which basically is a manual process I measured where on the frame that zone would be located and defined that in the form of the numpy array every row of that numpy array is a single point the first a coordinate is X the second is y now we can use that to Define polygon Zone there are only two key information that you need to pass first is polygon geometry the second one is frame resolution you can Define resolution manually if you already know what are the dimensions of your frame or you can use one of our other utilities video info to extract it automatically here is a simple example showing you how you can do that that's exactly what we did in our code Snippets so first we defined video info and then extracted resolution with an High right from that object now we just trigger our newly defined Zone with our detections and that internally will adjust the current counter of objects within the Zone cool so right now we see that the person has been detected and the counter in the middle of the Zone shows that there is one person in it so then I thought to myself wouldn't it be cooler if the edge of the Zone would be actually somewhere in the middle of the alley so that the counter would initially show zero and then when the customer enters the Zone it would go up by one that's exactly what I did I decided to move the edge of the polygon that was initially at the very very end of the alley and move it to the middle that obviously forced me to redefine the geometry of the polygon but that was pretty straightforward and as a result I got a new visualization when the zone is much smaller it only covers like a half of the alley the counter shows zero and the person initially is outside of the Zone the next step is to move from single frame processing to full video processing and to do it easily we will use process video function that function comes with supervision library and takes free argument Source path the target path and the process frame callback to use it we will refactor our code and move all the logic that up until now was used to process the single frame into that callback if we take a look at the last code snippet we see that the definition of the polygon is exactly the same we Define exactly the same Zone and exactly the same annotators the only things that are moved into the Callback are the inference part the filtering part and The annotation part now we can just process the whole video and take a look at the result foreign [Music] yeah this is something that I didn't expect that apparently the trolley is so large that it occludes the whole person and the bounding box moves higher and then the center of the bottom of the pounding box is no longer in the polygon which is essentially a nerd talk for computer says no it says no we could fix that but most likely we would need to train a custom model and instead of detecting people detect trolleysing zone so let's move to demo number two and the second demo we'll be looking at people standing on a subway platform and we would like to detect whether or not they stand too close to the edge to do that we'll create a thin Zone very close to the edge and whenever they will be in that zone the counter will increase this time just for fun we'll use detect Run 2 so we already installed the library at the very very beginning of the video now we just need to load mask our CNN model into the memory so let's do that and we can go straight to the first example once again this is like vanilla inference pipeline exactly the same setup as in the previous example with only one exception instead of using yellow V8 we are using the tektron 2 so right now when I run the cell we should see all detections for the first frame of the video and that shows you a string of supervision we just changed two lines to swap one model to another and the rest of the processing pipeline is reusable and can stay the same so straight away we noticed the same problems that we had in previous examples we have a lot of additional classes and we would like to filter them out on top of that we because the scene is super tight I think that it would be much better if we would skip labels for the bounding bonuses all together we will only work with person class so it really doesn't matter for us so let's scroll a little bit lower to the second code snippet and we can see familiar filtering that we used in the previous example but also some small changes in The annotation part for bounding box annotator we passed additional parameter called skip labels and that will result in removing the text on top of the bounding box and here is the result frame I think it looks much cleaner especially in the top part of the image previously we saw their only text now it's time to define the Zone like I said I would like to have a thin Zone that would go all the way till the end of the station and back here and whenever a person would enter the San center of the bounding box would be in the zone the counter should be increased so when we scroll a little bit lower to a next code snippet we'll see once again something familiar from the previous example I added a polygon numpy array that defines the geometry of our Zone I added polygon Zone this is all exactly what we've done previously the only difference is the different frame different Zone and different detector but more or less is exactly the same logic obviously for the first frame there is nobody who stands right on the edge of the station so the counter shows zero but when the train will arrive we should see a change in that value so once again we will reformat our code snippet to process the whole video instead of just one frame so if we scroll a little bit lower we defined a process frame function that we will inject as a callback to process video we move our detection and our filtering and our annotation part to that function and we are basically ready to process the whole video so at the very beginning nobody is in the zone and then the train arrives funny thing we even detected a driver of a train [Music] a little bit of time for the train to stop but what it does we can see that there is Zone counter finally changes to one and the zone is most likely triggered by somebody in the further part of the train [Music] then our counter goes up by a lot because at the same time multiple people are leaving and entering the train and when the last person exits the station Counter goes back to zero well that went a lot better especially compared to the previous example public transport is actually a typical use case for Zone analysis very often you would like to know how many people are there on station whether or not they stand where they supposed to stand stuff like that it's getting dark outside so let's quickly take a look at the third and final example for today this time we'll take a look at the market square divided in zones and calculate how many people are in each of those zones as promised we are using yet another model this time yellow V5 I will use a torch Hub version because it's the easiest for me to just load into the memory without too much of the headache now let's run the vanilla inference pipeline exactly the same as before yet this time we are using yellow V5 if we would take a look at the implementation here once again I just changed two lines to run the inference with different mode model and then convert results coming from yellow V5 into detections that are understandable by its supervision framework kids stuff our scene is completely flattered by detections both in and outside of personal category so once again we need to filter but I decided it's a perfect opportunity to show you how you can chain logical conditions instead of just filtering by class ID I decided we'll also filter out every detection with confidence lower than 50 percent so I just chained those two logical conditions and pass them into the filtering of detections get my detections filtered by those two conditions and now I can just plot them on the frame cool feature is that I can also run python Lan function and get the number of detections on the whole frame in that case it's 64. now let me show you how you can filter out detections not only by condition and the confidence but also by the fact whether or not they are in or outside of the zone so it will scroll a little bit lower to next code snippet we'll see things that are both familiar and unfamiliar to us obviously we have polygon and Zone this is something that we already done in two of our previous examples but this time we see a new line over here instead of just triggering the Zone we are also capturing the result of the trigger and the result is another mask that we can use for our filtering so this time instead of just chaining two logical conditions we are also chaining that with the mask coming from trigger so essentially we'll just have a people with confidence higher than 50 percent that are in the zone and that's exactly what we see on our output even though that the Market Square is full of people we only visualize those that are inside our Zone the rest of them are basically in ignored and the sound that you want to use for filtering or for counting doesn't necessarily need to be rectangular it can have any arbitrary shape and can have as many vertices as you want you are also not limited by amount of zones on a single scene you can have multiple zones and filter detection by each of them and all that flexibility allows you to create not only very useful but also very nice looking visualizations and that's all for today I hope that you enjoyed the video and found it useful I certainly enjoyed making those demos and building that particular functionality into supervision as usual like And subscribe and maybe leave a star under supervision repository nothing motivates me as much as seeing that somebody actually installed and tried to use this particular Library let me know what else would you like to have over there and in the meantime stay tuned for more computer vision content coming to this channel my name is Peter bye
Info
Channel: Roboflow
Views: 37,062
Rating: undefined out of 5
Keywords: yolov5, detectron2, yolov8, supervision, zone, detection, detection and counting, detection and counting zone, counting people, counting objects
Id: l_kf9CfZ_8M
Channel Id: undefined
Length: 21min 26sec (1286 seconds)
Published: Wed Feb 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.