Vehicle and Pedestrian Detection Using YOLOv8 and Kitti dataset

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is arohi and welcome to my channel so guys in my today's video I'll perform vehicle and pedestrian detection using ulate algorithm and kitty data set but before moving to the Practical implementation let's understand why vehicle and pedestrian detection is important so uh let's start with the autonomous driving so in the development of autonomous vehicle accurate detection of vehicles and pedestrian is important self driving cars need to see other vehicles and people very well they use this information to decide uh where to go how fast to drive and how to move safely and uh if we talk about traffic management monitoring and analyzing traffic flow rely on the detection of vehicles and pedestrial this information is used for optimizing traffic signals uh managing congestion and planning infrastructure improvements let's let's talk about Road Safety now detecting vehicles and pedestrian is important for preventing accidents and ensuring Road Safety by identifying these objects in real time Advanced driving assistance systems can provide timely warnings to the driver and this will help in avoid collisions and ultimately reduce the number of traffic accidents okay now let's see the data set now type Kitty data set and then open this first link so this is the uh Kitty data set website and kitty data set is a popular data set used for computer vision tasks particularly in the context of autonomous driving and guys since it release uh Kitty has become one of the most cited and used data set in autonomous driving and computer vision research and this data set is collected from a variety of urban suburban and then Highway environments offering a broad spectrum of real world scenarios and this diversity helps in training model that are capable of operating under you know different settings and conditions which make them more versatile and reliable okay and this Kitty data set uh include detailed annotation for vles pedestrians cyclist and some other uh objects in 3D and 2D format we can perform 2D object detection 3D object detection we can perform tracking we can perform lots of um this computer vision stask using this data set and if you want to see the data set so first just okay before move showing you the data set like 2D and the 3D data set let's click on this setup if you'll click on this setup you will see the you know the entire setup of the car on which they have collected this data and uh you you can see over here the sensor setup all the sensors and their details have been mentioned over here and what all things they have used to set up this car to collect the data all that Set uh you know sensor setup detail you can see on this page now just see this object under this object you can see 2D object 3D object and then the board's IE view what it means is that you can work with you it has 2D data uh object ano annotations 3D object data annotations and the bird eye view okay now for our today's problem we are performing object detection for that we will go to this 2D uh object and you will be redirected to this page okay and on this page you can see this first link download lift color images of object data set which the size is 12 GB we will download this file and how you can download it just click on it and and this download will start and you will get a zip folder inside your downloads folder and to download the training labels you will click on this link download training labels of object data set and the size is 5 MB when you'll click on this you will get a uh ZIP file inside your downloads folder now let me open the downloads folder and show you where this file is okay so let's go to the downloads and here you can see this one uh this one okay these are the images and I have unzipped it and this is the folder and this is the other folder the training labels and after unzipping it this is the file now let's open these folders one by one and see what all things are there so was first I'm opening this training folder uh images folder data object image folder inside that you will get two folders training and testing and let's open training folder inside that you have one one another folder fer and inside that here you can see all the images so we have 7481 images training images let's open one image and see so these kind of images you have okay training images now let's go back and go to this testing inside testing we have test images okay so we have 7518 uh test images so guys this is how uh these are the images okay now now let's see their corresponding annotation files text files okay so for that we are clicking on this label folder and inside that see guys here you can see that we only have training folder under images we were having two folders training and testing but uh under label we are only getting the labels of training data set now let's open it and you can see we have 7 481 annotations which are in text format so we are not having the annotations for our uh test files and we have train uh this annotations for this uh uh training data only now let's open one of The annotation file and let's try to understand what kind of values we have in it let's open this annotation file so this is the first annotation file and this is a class name and then we have these different values over here now let's understand what these values are so guys each line in The annotation file describe one object in a scene so in this text file we have only one line that means there is only one object in this image so suppose if you have a image which have three different objects then you will have three different lines and one line will give you the information related to one object so now let's understand all what all uh components are there in this annotation file for the uh so the first one is this one is a object type and in this case we have pedest spr and this indicate the type of object annotated and see uh so guys in this Kitty data set the different uh objects are car pedestrian van cyclist truck and then there is a miscellaneous class also tram person sitting and don't cares okay so these are the different uh eight classes uh Kitty data set is trained on and this don't care class simply refers to a origin of an image that contains object which are uh not of interest for our specific task okay so after that we have this 0. now what it is so this is truncation basically so what it it measures the value is 0. and this truncation measures how much of the object is visible the truncation is typically between zero 0 means fully visible and one means not visible so in our ISE it's zero that means the object is fully visible and then we have the next value which is this zero occlusion it indicates the level of occlusion of the object so uh let's suppose over here we are having a zero value so that means the object is fully visible and if if over here we have one that means the object is partially uded and if we have two over here as a value which means that the uh object is largely included and three means unknown okay so here you can have 0 1 2 or three value and I've told you what all values uh what every single value means okay and then have this we have this minus 0.20 so this is the observation angle 0.20 so the observation angle of the object so this uh - 0.20 is a observation angle of the object relative to the the camera view okay and then we have these four values as our bounding boxes so for object detection we need a bounding box bounding box simply means a rectangular box around the object so for that we need four coordinates so these are this is the first coordinate second third and fourth these are the four values so these four values Define the pixel coordinates of the bounding box surrounded the object in the image so it is given by four values first one is Xmen and then we have y Min then we have X Max and the fourth one is y Max okay so these are the four values after that we have these three values 1.89 0. this and this this is basically a 3D dimension of the object okay 3D dimension means height uh width and the length so height over here is 1.89 M width over here is 0.48 m M and length over here is 1.20 M okay so then we have this after that we have location these values over here these three Value First value second and this third value this tells the 3D location of the object in the camera coordinates so which is right down and forward three okay after that and it is also in uh meters guys after that we have this last value which is 0.01 this 0.01 tells the rotation of the object around the Y AIS which is up in radians and that's it guys so this is in uh this Kitty data set every annotation file will have these kind of detail you will have uh object type then occlusion type and then uh the movement and the 3D uh 3D dimension 2D Dimension object bounding box whether the object is included or not so all these details you will get for each uh object okay but now the next thing is we have downloaded the data right so now once you download the data the our next task is to convert these annotations into the format which YOLO model accepts right so for our YOLO model what what things we need we only need the class label and the bounding box so we need this and the these four values okay so apart from these things uh bounding box coordinates and the class name we don't need any other detail so you now you have to write a script which will help you to have the annotations in the YOLO format now let me show you I have converted these annotations into the YOLO format now let me show you that okay so so guys uh this is my folder where I have the data set which is in Yolo V8 format and uh my training code everything else is in this folder okay so let's open this data set folder inside this data set folder I have created three folder train test and Val inside train I have two folders images and labels and inside this images uh folder I have pasted the training images which we have from downloads okay from here we have the training images so how many training images we have 7481 images so what I have done I have pasted 7,000 images in the train folder and 480 images in the validation folder so guys what I've done I have divided my training images which we downloaded from the kitty data set I've divided that split that data set into two parts strain and Val and 480 images I pasted in validation folder and remaining 7,000 images I have pasted in the training folder okay and the same way they're corresponding under the labels I have pasted the labels okay and 7,000 uh labels we have in uh for training images and 480 labels we have for uh this uh validation images now let me open the labels file and show you what what kind of files we have now so this is the first file here you can see this is the class ID and these are the four bounding box coordinates now we have our annotation files in the text format in the format which our YOLO model accepts now we can train our model for training a model we need to install the ultral litic package so this is how you install it we install ultral litic and after that I am just showing you the version of ultral litics which I am using and my torch and Cuda version is this I'm using Cuda 11.8 and this is the GPU on which I'm working okay now let's import the ultral litic after that I don't want to train my model from scratch what I'm doing is I am just loading the pre-trained YOLO V8 model and then I am fine-tuning that model on my custom data set so how we do that we load the pre-rain model model and in this line model. train we have a data. yl file now let's open this data. yl file and see what is there so this is my data. yml file here you can see the file so here I have provided the path of my training images and here I provided the path of my validation images we have total nine classes and these are the name of those nine classes okay so this is my data. yml file and using this file our algorithm this YOLO algorithm will uh be trained on Kitty data set so I have trained the model for 200 EPO and the bat size is this and the image size is this so after the training you will get a runs folder so inside the detect folder you will see a train folder open it and in inside the weights you can see the last and the best weight so this is the trained model now we will use this model to make predictions on unseen data so and here let's open the confusion Matrix you can see the confusion Matrix okay and if you want to see the result here so these are the MTH values when the threshold value is 50 and math values when the threshold value is between 50 to 95 so now our model is trained let's test model so for testing our model so guys uh this is how you train the model and let's suppose if you want to uh due to any reason uh your training stopped in between then you can resume the training using this code so what we did over here we just you just need to provide the path of the last weight and then St model and then model do train and resume true this is how you can resume the training okay now let's see how to perform predictions on unseen data so for that you need to provide the path of your train model so we have our train model this inside this uh folder and then if you want to test on images just give the path of the image save through will save your result in runs folder and then if you want to test on video then this is my video on which I want to perform the testing and save through now let me show you the output video you okay let's go here open this one so this is the video One let's open it so here you can see it is let me stop it so here you can see we are detecting pedestrian car class see cyclist is getting detected okay now let's see prediction on one more video so guys this is how you perform vehicle and pedestrian detection using YOLO V8 and kitty data set I hope this video is helpful and guys if you like my video please like share and subscribe my channel thank you for watching
Info
Channel: Code With Aarohi
Views: 1,251
Rating: undefined out of 5
Keywords: yolov8, objectdetection, computervision, ai, artificialintelligence, yolo
Id: g4Q1tW988eI
Channel Id: undefined
Length: 18min 32sec (1112 seconds)
Published: Sun Feb 18 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.