YOLOv9 Instance Segmentation on Live Webcam and YOLOv8 Comparison

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys welcome to new video in this video here we're going to take a look at the new Yol v9 model we're going to see how we can do instant segmentation with this new model it does also support that now so previously I also have a video where we're going in and train a custom YOLO V 9 model this is the new version that was just released in the YOLO family so we have YOLO V 8 like YOLO V 5 all the different variations and versions of YOLO but now we have YOLO V 9 as well and it supports instant segmentation so it's pretty cool we're going to run through a bunch of videos but we're also going to see how we can run it on a live webcam in one of the upcoming videos we're definitely going to train a custom Yul v9 model for instant segmentation so definitely stay tuned for that so let's just jump straight into the YOLO v9 GitHub repository and let's scroll down and see act like instant segmentation models which has just been released so this Reber here for Yol 9 already has a 7.4k Stars let's just scroll a bit down here we can see comparisons again I have a video where we go over all of this like how you can run the model and also how you can train a custom Yol v9 model on a custom data set where at with rlow we create the whole computer vision training pipeline so definitely check that video out here in season comparison so again they state that it is stateof thee art and it's also like significantly faster compared to some of the other models and if we just taking a look at it we both have the Y v9 and also the Gan model so those are both the new models which are coming out so we can see if you compare it with Yol 8 here which is the purple one or like the light purple we get significantly better performance and also faster like inference speed or like at least a lower number of parameters ERS so I'm really excited to test this out here and now we can do more comparisons like side by side with Yol V 8 and also y v9 initially I think that Yol V8 is still a bit better compared to uv9 and again it really depends on your data set so definitely test out both models but again let's see how it does with instant segmentation now for up detection I still think that Yol V8 is slightly better and it was also like significantly faster when I was running it but it was with no optimizations here we can see the performance again we have the different variations we have like tiny small medium and then we have the CN model for theet land versions so this is pretty cool and we have the average position on the validation set the number parameters and the floating Point operations per second so yeah you can directly go in and take the models here we can also use the allytics framework everything is built into that already so you can extract it just as we were doing with a YOLO V8 model so if we just go a bit further down here we see some training and so on I already have a Google Cod laab notebook covering that here we can see how we can run inference with the detect. py script but again everything is still an already integrated into um Ultra litics so right now we can see the panoptic segmentation here that is also an option we're going to take a look at in one of the upcoming videos so that is also pretty cool where we basically segment out like every single Pixel in the whole image where for the instant segmentation which we're going to take a look at in this video I'm just going to zoom in here a bit but we're going to take a look at the instant segmentation in this video where you're going to be able to actually just extract all the individual objects that you want to segment out so if you just scroll a bit further down we can see that we have object section with Galan C detection. pip PT we have optic detection this is how we can run the training script we can see the different variations Galan and also YOLO v9 now we also have instant segmentation and if we scroll all the way down to the bottom we also have panoptic segmentation where we just segment out every single Pixel so if you're using like panoptic segmentation if you combine that with a depth map you actually like get really good freedy understanding of your whole environment so I'm definitely going to put out some videos about that so we can actually go in and build like a voxal grid of our whole environment so let's say that we know every single Pixel we know the depth value but we also know what class or like what is that pixel that we're actually taking a look at for the whole camera and if you don't have multiple cameras you can stitch together you can even like create Point clouds and so on um so we get this box will create so you can do a bunch of cool things in computer vision Spas with panoptic segmentation and if you're looking at like Tesla all the human robots and so on like they're basically like navigating around on a panoptic um segmentation map and then also dep values so we have panoptic you can also do image capturing but we are not going to take a look at that for now the most important thing is pom segmentation and also instance segmentation so this is how you can train a custom model again you can go in fine tune these models here on your own data set I'm going to show you how you can create that whole Pine line with roof flow how you can extract the data set in there and so on but we can also jump into the alter litics documentation they also have the Yol 9 model integrated directly we can get a short overview over it like you can read through um the overview and you can also get like an introduction to it you can also see the the core innovation and so on if you want to read more details about like what architectural changes has been made and also compared to YOLO V8 so if you scroll a bit further down we'll be able to see like the performance again it's the exact same thing but we're going to get a Cod snipp it in just a second right here so we have user examples we can both use like predict train Val validation and also export as we used to with all the other YOLO models that we have in here and especially with yolo V 8 from Al litics so these are supported modes and also tasks right now we can do object detection and instant segmentation directly with ult litics so they don't have the pen optic segmentation yet I'm not sure if that's going to be added but we're definitely going to cover that in another video so hit the Subscribe button under the video here as well and also the Bell notification because we're definitely going to do that it's so cool that we can segment out every single Pixel so that's pretty much it we can just take this code snippet here throw it into our own custom pth script we're going to run through a couple of videos but also tested out on a live webcam so we just jumped straight into the python script I've just copy pasted the co snpp it again it's really just set up with Al litics the only thing that we have to do is from Al litics import YOLO and then we can set up our model create an instance of it right now we need to specify Yul 9C Das sec. PT and then it's going to automatically download the model file to your computer if you're running it for the first time after that it's basically just going to load in that model create an instance of it we can get the information about our model and then we can go and use the predict function from Ultra litics we can then throw in the different sources that we have could either be like an image that we just extract from our web camera if we doing using it in our own custom applications or python script but we can also directly throw in a full video could be a nonp PL image could even be like a webcam so like the index to your webcam we have a bunch of different arguments that we can set to the predict function as well right now we're just going to show the results and also save it but we can also go in and set the confidence score directly if you want to use that as well we can also return the results here if you want to extract the results and so on and there are sponsor videos on the AL L documentation but I also have it on my own channel like how you can set up a custom class extract results use them in your own application and project so how can you extract the bounding boxes the classes confidence scores and pretty much like everything within this results class that is going to return so this is a generator will just keep returning the results when we're doing predictions so yeah we pretty much just have everything right now and we can either like print the results but right now we're just going to visualize the results see how this model perform can just Swap this out here as well with Y8 and this is how you can swap between Y8 and Y V 9 model with ultra ltic so again we're also going to do some custom scripts we're I'm going to show you like all of that like how you can extract everything so stay tuned for that as well but right now let's just go down and run the python script I have a video that I'm going to show you so this is just like some suitcases running at this uh belt here at an airport so this is the video that we're going to pass through it let's see if we're able to segment out these different suitcases running in this conveyor build then after that we're going to test it on a live webcam and do some comparisons with Yol 8 directly and I think that's pretty pretty cool so let's just run it directly here so Yol v9 segmentation we're going to use the YOLO V 9 C model and it's going to automatically downloaded if you haven't run it before so right now I'm just going to drag the window over we can now see that we're taking the suitcases fairly high confidence score we can see that we act like missed them here at the end so now right now we get the person up here these are the pre-train models so it will just detect classes from the C data set we have 80 classes could be like person suitcase car truck and all those different like standard classes right now we can see that it runs around 250 milliseconds inference I'm running this on a MacBook um M2 Chip so this is running on the CPU right now it will be significantly faster if you're running it on the GPU but having around like four frames per seconds is not too bad on CPU so now we can see you act like detect a backpack here we don't really get any false predictions and even though we have some very small Optics at the end is still able to detect the suitcases we have the backpack let's just let it run for some more time here we detected a c but again it has very low confidence score it had around like 27 uh% so we can definitely like fill that that out relatively Easy by just having a Fresh Value you can see that it detects a car again here so we get some false predictions here and there just hit Q here or escape on a keyboard and should be able to terminate it we can then test out the ul8 model directly and then we're going to run it on a webcam so right now the only thing that we need to swap out is basically just the path to the model if you're running it for first time it will automatically download it so here for y V8 we also need to specify if you want to use the medium small Nano all all these different variations right now I think it is around like the media model that is comparable to the ulv 9 model which we have avable uh for ultral litic right now you can see here that it starts the download the model so it's around 52.5 4 megabytes it's going to do everything for you with just a couple lines of code you're now able to do inference with a bunch of different models also have other variations in there right now we can see that we get around like 160 150 uh 150 milliseconds inference time so it's significantly faster compared to the UL 9 model it is not exactly like twice as fast but it is pretty damn close to and again this was also like kind of like my experience when I was running it for optic detection so the ulv 8 models is still like faster when we're just using the raw pie torch models so it could be faster if you're converting it to onx tens RT and like the optimized Frameworks but I haven't tested that out yet I'm definitely planning to play around with that in the future so again we get pretty good results like we get the handbag we get the backpack also the suitcases here in the end we also get the person here and there so again the predictions look fair Fairly nice and I also think that we get some false predictions here in the in the front with the car as well I'm not sure if it was the car class but we got some we got some false predictions we were basically just covering all of it we can even like see these black flaps sometimes it is detected as a suitcase so that is definitely like like an error this now going to test it on a live webcam we can just go in here change the index so right now we can just specify like a zero I just need to make sure that this is act like my webcam so I have two cameras connected to it right now so it might not be able to boot it up there we go we can actually see that I'll just pull it over so this is the webcam that I have so this is webcam in front of me right now we're detecting a chair um we also got a bicycle here which is act like the microphone or like a motorcycle this is this is not really close to to being that but yeah again we you can see now I can really not mask around me here as a person remember this is the YOLO V8 model still around the same inference time I can turn around here you can see my phone we have a mouse keyboard mouse person we get some pretty nice detections around um here we got a TV but again the mask is very nice around it you can even see it over here to the left on the recording tool it Texs me as a person with a fairly nice um segmentation mask so just terminate this one here and try it out on the YOLO um YOLO v9 model there we go we're just going to go back and again we specifi the Ser index so this is the webcam you can pass in any Source here like nonp array PL image video file image file you can even like just throw in a YouTube video you can just throw in a link to YouTube video or like a URL and it's also going to run predictions on that right now it's going to open up the webcam there we go so again it's not as fast but the the segmentation mask around it is very very nice I can just try to like put it around here here let's say if we got the we get the bicycle again so we're not getting the bicycle in this campable here we don't really get oh we got one here so it's a motorcycle but again it's really really low confidence score probably around like 050 here like not too low but sometimes it is in the lower end but the chair here is not really detecting the chair in the background right now but uh let me try to take it up a bit further yeah so it's not detecting the chair here in the background as it did with the Yol V8 model so yeah I probably still think that the Yol V8 model is is a bit better on on this part as well um let's try to put it around we get the keyboard mouse person the mouse and so on is fairly good also the person over here the laptop right now it act like detect this as a laptop where before it was detecting as as a TV we also get the TV here so I think it really depends on the use case and the the project that you're working on so definitely test out both models on your data set if you have multiple different videos it feels like you kind of get the same accuracy but the inference speed is faster for the Yol 8 model so definitely test that out but again you can optimize the models even further export them into different formats you can do that directly with ultra litic as well so I think this is a good video just showing like how you can use the Yol 9 model for instant segmentation before it was only possible with optic detection and even like the panoptic segmentation we're definitely going to take a look at that from the custom code from the YOLO 9 GitHub repository so stay tuned for that check out the video with how you can actually use and fine tune a custom y v9 model so I think it's pretty cool that we have multiple models now that we can test out on your own application projects before this Yoli 9 model was released I act like think that Yoli 8 was significantly better both on performance wise but also on inference speed compared to any other models out there but again Yol v9 is still building on top of Yol V 8 and so on so there's still a lot of similarities but we get some differences test it out on your own data set I have videos about all of it and we're definitely going to cover it way more in the future this was to get like a high level overview how you can run it use it with Al litics and then I hope to see you guys in one of the upcoming videos until then Happy detecting
Info
Channel: Nicolai Nielsen
Views: 1,907
Rating: undefined out of 5
Keywords: object detection yolo, object detection python, opencv, opencv dnn, Yolov9, How to Train Yolov9, Yolov9 Custom Obejct Detection Model, Yolov9 Object Detection, Yolov9 Tutorial, Opencv Yolov9, Object Detection with Yolov9, Deploy Yolov9, How to run inference with Yolov9, How to train custom object detection model, Yolov9 the new state of the art model, How to train custom Yolov9 model, Yolov9 vs Yolov8, Yolov9 Setup and Train, segmentation, YOLOv9-seg
Id: zaF1_z8QI9Y
Channel Id: undefined
Length: 14min 1sec (841 seconds)
Published: Fri Apr 05 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.