L-6 | Object Detection Using Faster-RCNN

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is arohi so guys in my today's video I'll talk about object detection so object detection is a technique in computer vision that focuses on identifying and locating objects in an image so object detection algorithms generally provide the output in the form of bounding boxes and class labels so what are bounding boxes so bounding boxes are just like a rectangular boxes drawn around the detected object in an image or a video frame okay and with that bounding box uh the algorithm will also assign a class label to identify what object has been detected okay and the algorithm will also provide you the confidence score for each detection so this is how object detection algorithms work so there are various uh deep learning object detection algorithms are already there in the market in my today's class we will use one one pre-trained object detection model we will understand the code properly how to use that pre-trained model okay and we will then um you know we will try that object object detection model OKAY on an unseen image and we'll see how the model uh perform the detection on it okay so guys in my next video I'll show you how to work on a custom object detection model but before moving to that part it is important to understand how object detection model Works in general so today we are going to cover that okay so let's start so guys for that so uh this is my uh GitHub repo and under this lecture six you are going to get the code which I'm going to discuss today okay so this is the code okay this is the Jupiter notebook which I'm going to show you today and this is the image on which we are testing all right so now let's go to the code this is my Jupiter notebook and guys remember I've already told you that uh we have I have shown you how to create a separate environment and every time whenever you want to try any code you need to activate that environment and then you can work on it okay so how you can activate the environment just open command prompt enter in the drive where your environment is so inside D drive I have this folder and here I have created my environment and now I am activating it okay now I'm opening the Jupiter notebook and you will see your Jupiter notebook here now yeah here so under this video 6 I have to this code and this is the Jupiter notebook I'm talking about let's open this Jupiter notebook now let me increase the size first Zoom it okay so uh first we will load the pre-train model and in my today's class which pre-train model I'm using faster rcnn okay so we will load the faster rcnn model and you can see here this is how you load it from torch vision. models. detection from this location from this module we are loading faster R CNN resonate 50 fpn this model we are uh loading and this is a pre-train model this model is trained on Coco data set so guys Coco data set have 80 classes 80 different classes and if you want to know more about Coco data set just write uh Coco data set okay let's Okay let me write again just write Coco data set and then enter so this is the official link website of this Coco data set open it and here you can know more about this data set and data data set under this go to explore and these are the 80 different classes um of Coco data set and the model which I'm showing you today this one this model is trained on these 80 different classes okay so faster R CNN is an object detection Model resonet 50 is a backbone and we are resonet 50 is basically a convolutional neural network and which have 50 layers okay and we we are using it for feature extraction and then we are using fpn so why we are using fpn to refine the feature extraction process so guys if you want to know more about this resonate 50 or what is fpn how they work what are their architecture so on my YouTube channel I have separate videos on these two topics you can search and you can learn more about them okay so this is the model we are loading and here we we are initializing the model and we are passing these two parameters pre-train true pre-train true means we want to use the pre-train model progress false progress false means uh when you see when you'll execute the cell let me execute it when you'll execute this cell so there is a downloading process Will Go On which will load which will download the pre-train model this faster R CNN pre-train model okay so this progress Falls means you don't want to see that progress bar on the screen okay so if you will you can uh if you want to see that progress you know progress bar uh that how much download has been completed then you can make it true okay so after that once you initialize the model after that we are calling it like this do evil why we need it guys this is a mandatory step if you want to use uh a pre-trained model so you you need to before making predictions you need to write this do e and then uh you can uh try testing your model okay so this is how you load your pre-train model now once we load the pre-train model now we want to make the predictions now how we can make predictions we we need to select some image on which we want to perform the testing so for my today's class this is the image on which I want to perform testing this image okay and over here I'm just displaying the image and how I'm displaying the image I'm just using the python Imaging library and from that I'm importing image and I'm just displaying that image okay so this is our image on which we want to perform the testing after that guys uh in my uh first or second video of this lecture I have told you that pyos whenever you want to work in pyo you need to convert your input data into a tensor whether it is a text or a image or a video you need to convert your input data into a tensor then only pyo model will accept your data okay so now we need to convert this image into a tensor and how we do that we are using dot this transforms from this module from torch Vision module we are using this sub module transform module um module this is the image here we are just defining the transformation defining the transformation means what all transformations we want to apply on the input image so here we are just applying one transformation what is that that we want to convert our input image into a tensor so some time uh you want to resize your image that also you can provide it over here okay so this two tensor will convert the image into a tensor so here in this line we are just defining the transformation now now we want to apply the transformation on a test image so what is there in a test image this image so and here in this variable we have defined the transformation now we want to apply all the Transformations on this image and after that you can see we get this kind of output okay now this is our tensor so earlier our input was like this but after converting it into a tensor our input data our input image look look like this this is a 3D tensor so 3D means threedimensional where three stands for number of channels this is a colored image R GB three channels and here you can see this first Matrix is of uh for red Channel this second Matrix uh is of green Channel and this third Matrix uh tells you about the blue Channel RGB and you can see all the values are normalized between 0 to one right so this is a 3D tensor so now this will become the input to our model okay so okay before moving to that part guys right now what I'm doing here we have converted the image into a tensor then we are just checking the shape of it so you can see that we have three channels height and width of the image so guys um uh this spy torch accepts the data in this format first channel C and then the height and the width of the image okay now uh we need to perform one more step before uh giving this input to the model that is we need to provide the batch Dimension see this is the input shape right now the tensor input tensor shape but what we want is we want this kind of input to uh inputs then only we can provide provide this input to the uh our model okay so what we are doing is this is we are adding the batch Dimension why we are adding the batch Dimension because model works on batches of images so that's why we need to add a batch Dimension and this UNS squeeze is uh UNS squeeze simply means when we want to add uh a dimension of size one and this di dim equals to zero means at position zero we we want to add position zero we want to add the dimension one then we have Channel height and width of the image now here our input tensor is ready now this we will provide this this particular tensor will become the input to the model okay so guys one thing to note over here today we are working on object detection model OKAY whether you're working on object detection model or a image classification model till here the steps are same you always need to convert the input image into a tensor right then normalize the values between 0 to 1 after that add the batch Dimension this these steps will be same for whether you're working for image classification problem or object detection problem okay so now here object detection model now what it is let's scroll up object detection model here you can see in object detection model we have initialized our model so we are calling this model over here and we are providing the input tensor this tensor okay and then we are displaying the prediction so this is our output now let's understand the output see we have boxes under boxes we have how many boxes this is the bounding box coordinates of first box this is the second box coordinates third box and the fourth box box okay so what it means is that our model have detected four uh four objects in the image on which image in this image our model has detected four bounding boxes and and these are the class labels of those boxes these are the IDS means 18 18 is the ID of some class and 52 23 and 5 52 these are classes detected by our model and these are the confidence score like means the for class whose ID is 18 for that our model is 99% sure that the object belongs to this class 50% sure that object belongs to this class 16% sure that object belongs to this class and this much 6% sure that object belongs to this class for these four bounding boxes okay now over here we were not able to uh uh you know see the results properly so now let's visualize the results so for visualizing so there in this list I've just mentioned all the 0 classes which are present in Coco data set so that we can map these IDs to their class names okay over here I'm importing the matplot lib and the subm module of mat plot lib just to plot the bounding boxes and the image and guys now remember pytorch accepts the data in a different format first it accepts the channel then height and width of the image but when you want to display the image on the screen so you have to um you know change the order you need height width and then channel so that step we are doing over here okay and then we will show the image and here you can see these are the prediction boxes labels and scores okay and then here X1 y1 X2 Y2 these are the four variables which will have the bounding box coordinates okay label name rectangle using this patches do rectangle we are drawing the rectangles around the object and PLT do text will provide the text to those boxes okay now let's see the output you can see the output Now dog we have the bounding box then the class name then 1. means 100% sure that this is a dog then our model is this much sure that this is a beer and this is a broccoli now guys let's refine the results now let's write some condition like if the uh confidence score is greater than 80% then only show the bounding box otherwise don't show it so for that what I've have done is see this is the same code of visualization which we have discussed here the only thing which I have added is this here you can see uh so I have used another variable with the name of confidence threshold and I have mentioned 0.8 means 80% threshold and over here inside this Loop what I'm saying is if the score is greater than 80% then only display the bounding boxes and then the class labels and the this confidence threshold so you can see here we have a proper result dog and this is the uh Confidence Code okay so this this this kind of output you will get with the object detection model object detection model will find out locate the object in the image or a video it will uh draw the bounding boxes around it it will provide a class label and with that class level it will provide you the confidence score and that confidence score simply tells you that how much your model is confident that this particular object belongs to this particular class okay so this code I have shown you that the GitHub link is mentioned you can get this code from there and you can try it so I hope this video is helpful thank you for watching
Info
Channel: Code With Aarohi
Views: 4,083
Rating: undefined out of 5
Keywords: computervision, objectdetection, pytorch, ai, deeplearning
Id: AOosZVrTUbQ
Channel Id: undefined
Length: 16min 9sec (969 seconds)
Published: Wed Jan 24 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.