Mastering Object Detection in Images and Videos with YOLOv9 in Google Colab: A Step-by-Step Guide

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

YOLO v9 is the new computer vision object detection model released by chin ya Wang and his team on 21st February 2024 in this tutorial we will review how we can do object detection on images and on videos using YOLO v9 in Google cab before you run this scrip please make sure that you have selected the runtime as E4 GPU Google collab offers B GPU so you can use this so in the first step we will clone this YOLO v9 GitHub PR so you can simply go here and you can just click copy this URL from here and you can just come back to the uh Google cab notebook and over here you will write get clone and here you will pass the URL that you have copied so here you will just add the URL that you have cop it so now we will just run this cell and we will be able to clone the GitHub repo into our Google cab notebook and here you can see the update repo which we have cloned so YOLO v9 introduced two new architectures YOLO v9 and Glen so in the YOLO v9 paper two new architectures have been introduced one is the YOLO v9 and other is the Glen both model weights are available in the YOLO v9 repository released with the paper so if you just go over here so and if you just click on this version 0.1 and here you will find find the Glen model weights as well okay Glen C Glen e and YOLO v9 C YOLO v9 e so you will find the Glen model weights as well as the YOLO v9 model weights so YOLO v9 and Glen come with four different models one YOLO v9 small YOLO v9 compact YOLO v9 medium and YOLO v9 extended so if you just go over here you will find the list of models that YOLO v9 come with Yol 9 small YOLO v9 medium YOLO v9 Compact and YOLO v9 extended similarly you GL also come with all these models as well so in this tutorial we will see how we can do object detection with yolow v9 as well as with gland model as well so in the next step we will download the pre-train model WS from the GitHub prepo the Ws for the YOLO v9 small model and YOLO v9 medium model are not available at the time of writing this notebook so if you just see over here uh YOLO v9 comes with four different models YOLO v9 small medium Compact and extended but over here you will see that we have only weights available for the compact model as well as for the extended model and similarly in the case of Glen we have weights available for the compact model as well as for the extended model we have don't have the weights available for the small model or for the medium model so we have two weight model uh weights available Compact and extended while we don't have the weights available for the small or the medium model so to download this we you can simply uh click over here copy the link address and you can simply write w get and you can add the link address over here and you will be able to download all these weights into your Google collab notebook so I just download all the weits so I'm just downloading the Compact and extended weits for the Glen and YOLO v9 model so here you can see that I have downloaded YOLO v9 compact model VS YOLO v9 extended model weights and similarly here I've downloaded Glen compact model weights and Glen extended model weights so now in the next step we will set our YOLO v9 folder which we have fled over here our YOLO v9 repository which we have flown into our Google goab notebook so I will just set this uh directory or I will just set uh change my directory to YOLO v9 so I have just changed the directory to YOLO v9 folder so now we will install all the dependencies if you just see the requirements.txt file in the requirements.txt file we have listed all the requirements or all the libraries that are required to run the uh to do the detection on images and videos so if you want to install all these packages once you can simply write pip install minus r requirements.txt so it will install all the libraries dependencies that are mentioned in the dxt file so I will just run this cell and it will install all the dependencies that are mentioned in the dxt file so this might take few seconds before it gets completed now we have installed all the packages that are listed in the requirements. dxt file now in the step number five we will see how we can do inference on images and videos first we will do inference on image using gallon e model so this is the best model uh in the gallon Series so first of all I will just download a random image from my Google Drive into this Google cab notebook over here so I will just run this cell and it will download that image from my Google Drive into this Google cab notebook and over here if I just go to the YOLO v9 folder and over here you will find the image 1.jpg which I have downloaded okay so now I will just write over here Python detect. piy and here I will just pass the weights and which I have downloaded already above and here I will just pass the image name in the source and device zero because I have selected the runtime as GPU so if I just selected CPU then I will just write device CPU but I'm using Ive selected the runtime as GPU so I'm writing device zero and now I will just run this cell so if you want to get familiar with more about this command you can just go to the GitHub repository and over here you will see uh all the steps are written how to uh if you want to do inference with the YOLO v9 model so you can write python detect D.P and if you want to do inference with the gallon models you can simply write python detect. Pi so I I'm doing inference on image using Gallen e model so I will just write python detect. pi and here I will just pass the wids file name and the source and device so this is all you require so now I will just run this cell over here so the inference take few seconds so it is about to be done so now you can see over here we are using Tesla T4 GPU and our results are saved in runs detect experiment folder so if we just go to the y9 folder and here we have the runs detect and experiment and here is my output image saved so I will just first from IPython I will just import the image library and let me display my output image over here so you can simply copy path from here you can remove this because YOLO v9 is already set as our Trend directory and if you just run this cell over here so this will display you the output image so now you can see over here using Gallen e extended model we are able to detect the burst persons handbag backpack uh uh we have detected traffic lights as well and this is not the surf board so this is the wrong detection uh one there is one wrong detection else all the other detections look pretty fine it has detected persons which are far away or blur as well so now we will do inference on the same image using yolow v9 extended model so if you just go over here so these are the best performing models you v9 extended they have more number of parameters as well and they have good main average precision as compared to the other YOLO v9 models so we are using the best uh YOLO v9 model as well as the best gallon model which is the extended model so now as I told you to do inference on image video using YOLO v9 we need to write over here if I just show you we need to write python detect D.P so we will write over here python D- .p and here I will just pass the VDS name and here I've selected a source as image one I'm using the same image and device as GPU for GPU I will write zero so now I will just run this cell over here so and the inference takes some time so now it's using layer and you can see that we have uh folder exponential to is been created and inside this we have the output image and let us uh display this image output image over here so you just need to find pass the file path over here and you can see that uh in the previous uh GL e extended model we have one wrong detection which is s of w but uh this is not the case with the yulo v9 over here so exactly as YOLO v9 outperforms Gan e model as well so now you can see that we don't have any false positive over here and you can see over here we have the sandbag backpack a bus persons and it has deted uh persons which are quite far away or blood like you can see over here so this is good now we will be doing inference on videos using yo9 and Glen e model as well so let's first downloaded uh Sample video from the drive into this Google cab notebook I've already placed the sample video on my drive and I'm just downloading that video from drive into this Google C app notebook so now you can see that this demo. MP4 video just okay now you can see this demo. MP4 video over here we have downloaded this so now I will just try and write Python detect. piy and here I will just pass the vs file name and the source as demo. MP4 and device zero for GPU so let's run this s so now you can see that our complete video is being divided into 1,314 frames and detection on each of the frame is being done one by one so this will take some time so let's wait for it to get finished and then I will display the output video over here so this might take few more seconds before it gets finished for the inference on this input video demo. MP4 is been done we have used the Glenn E model so Glenn stand for generalized efficient layer aggregation Network okay so if you have also displayed the output demo video over here so let me just download this from here and let me show you how our output look like so here we are using extended model the best performing model among other CL models so I have downloaded this video so let me just let me just navigate my screen towards this video so over here you can see that we are able to detect uh so this is a wrong detection truck uh okay and this is also a wrong detection person this is not the person and this is also a wrong detection and this is not a truck as well so this is other wrong detection okay else we have able to detect the cars trucks as well so the results look pretty fine to me so let's go back as well and now I will download other video uh okay from here so I will I will just download the video from Google Drive into this Google cab B and let's uh run uh Glen model on this input video which is test walk. MP4 so let's run this so I already told you Glen stand for generalized efficient layer aggregation Network okay so the complete video is being divided into 341 frames and we are doing detection on each of the frames one by one so this might take few seconds to complete and then we will display the output video over here the inference on this test box. MP4 video is also done and here is our output video I have also displayed the output video into this Google collab notebook and let me download this video over here and let's see how results look like so let me just open this up okay okay and let me navigate my screen so over here you can see that we are able to detect the person's handbag over here and um the results look pretty fine like you can see that uh but there are some false positives as well so if you just see over here this is not a tennis record or this is not a frisbee so there are some uh false positives as well like you can see over here as well it's detecting at tennis record or fisp which is wrong okay and if we just go ahead so now we will see how we can do inference on these two videos demo. MP4 test box. MP4 using YOLO v9 extended model YOLO stands for you only look once okay so if you just go over here in this GitHub repo so if you want to do inference with the YOLO v9 models we need to write python detex dd. so we will run this by file and if you want to do inference with the clan models we just write detect doile and if you want to do inference with the YOLO v9 models we use detex d. so now I will just run this cell over here and I will do inference on this video so let's see how does it goes so the complete video is being divided into 1,314 frames and we will be doing detection on each of the frame one by one I'm using uh the K9 extended model the best or outperforming model among all the YOLO 9 models you can see that YOLO 9 extended model uh gives mean average perion on the validation set of the MS data set is 55.6% and it has more number of parameters as compared to the other YOLO Vine models like it has 57.3 million parameters so like it is about to complete and let's see how does it course so uh 760 frames are done and let's wait for all the frames to complete and I will also display the output demo video over here so the inference on this demo video with yolo v9 extended model is been completed and I have displayed the output demo video over here as well so let me just download it and let us compare this our this results with the clan Model results as well so I will do the comparison over here as well so let me just navigate my screen so over here you can see these are the results from the YOLO extended model this is a wrong detection uh the clan model is also detecting this as stru but this is not the case over here this is also a false positive okay so there are some false positives as well uh but in the with the uh Glen model there are more false positives I believe so these are the results over here which you can see over here so let's go back over here okay so let's uh run on this other video test walk. MP4 and we will be using YOLO v9 extended model again over here as well so now I will just run this cell over here and let's see how results look like on this video when we try CL extended model there were some false positive like it was detecting B holes as fre P or tennis tret let's see is if this is the case over here as well or not so the complete video is divided into 341 frames and detection on each of the frame is being done one by one so let's see how does it goes now I will just display this output video into this Google forap notebook as well so this will take few more seconds and here you can see all the models which I have downloaded from the drao into this Google p app notebook you will find all the names over here as well so this might take few more seconds before it gets completed so the entrance on this video is being done and here we have the output video so let me just download this up over here okay so let me just navigate my screen towards this output video and let me just play it from start okay so now you can see over here we don't have okay there are very less false positives okay there is one false positive over here else overall the results look pretty fine like in the case of Glen extended model there were more number of false positives like here and in this case we have quite less false positives so that's look good so in this tutorial we have seen that how we can do object detection using plan and Yol v9 models and we have seen that how we can do run these models in Google collab and do object detection on images and videos so that's all from this tutorial thank you for watching

Info

Channel: Muhammad Moin

Views: 906

Rating: undefined out of 5

Keywords: yolov9, object detection, object detection using yolo, object detection in images, object detection in videos, object detection in images and videos with yolo

Id: CWr0mJ8y5M0

Channel Id: undefined

Length: 19min 0sec (1140 seconds)

Published: Tue Mar 19 2024