Grounding DINO | Detect Anything | No Training | Zero Shot Object Detection

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey guys today I'll show you grounding Dino a latest state of our zero shot objectation algorithm how you can implement step by step on python so before going into the implementation we just need to know what is down in time and how it's work basically grounding Dyno is a self-service zero shot object detection algorithm and which candidate object from a text prompt so what is text prompt basically text prompt may be either its string or list so this grounding Dyno can detect object from the string or it can detect object from the list so now we'll know what is test and what is string and how we can detect from this let's see with an example detect object from the string here you can see a person with a laptop if I send this prompt and an image to this algorithm then this grounding Dyno algorithm will detect a person with laptop or something like points on the table or we can detect multiple objects from the string like this mouse pane tissue boxes like this we can send the string and image to this turning down algorithm and it will detect the bounding boxes accordingly and another way from the list we can detect object in here we can send the test prompt as a list basically it is a python list in this way we can send Mouse pane wallet this list of classes into the algorithm and emails then Downing diagram algorithm will detect the bounding boxes if it is found on this image so in this way basically grounding Dynam detect object from the image so step by step I'll show you first I will show you detection from the stream and then I will show you the object detection from the list so let's get started first you need to clone the grounding Dino from git for this we need to copy this line and paste it to python terminal after clone a grounding dyno folder will be create under the object detection here you can see the grounding Dyno folder after cloning from the gate and here you see the requirements.txt file vision supervision this all package is to install so you can use pip to install all of these first you will install the tors version 1.12.1 you can see here I have already installed this I show requirements already satisfied then we'll install the torch vision it's the version 0.13.1 I have already installed all of this package that's why it's showing requirements already satisfied and then we will install the supervision supervision is basically used for visualization uh like matplotlib but it's very fantastic so you should install 0.6.0 version now we will move to the grounding dyno folder and here you can see the requirements dot taste and setup dot by files so now just we need to execute this command I'll give this all the comments in the description box in my video just you need to copy and paste it here then we have to create the python file main.pi and first of all and under the grounding Dyno folder we need to create this main.pi first of all uh we'll import uh load module and load image predict and annotate then you need to import supervision so first we try try to load the grounding Dyno model so for doing this we need to first show the model config path so from here we can get this model config path I will show you later I am assigning here config path variable and checkpoint path variable checkpoint is basically the modded OS this is the variable now we need to assign this config path and the save point path so inside the grounding dyno you see a folder config inside this we can see the last one t uh we need to assign this file path inside the config path variable so you just need to copy it and paste it here then one thing is Remain the checkpoint path segment path basically is the modded OS weight to assign this first we need to download the save point uh we need to get the weight so the second part is basically the model weight and we need to download it for getting this 08 we need to just execute a command uh before that we need to create a folder inside the grounding Dyno directory uh oh it's inside this directory will download the model wait so we need to move inside the words and then as you can see in the nothing is showing here so we just need to execute this command I'll give all this command in my video description box you can visit so in this way where uh the model it will be download inside the weight folder now you can see the model weight and we need to copy this path I need to assign second path variable so basically loading the grounding Dyno model we need to assign to variable config path and the second path so if it is okay we can get output from the print model hopefully everything is okay yeah uh this is the basically the model structure so we have successfully load this model now I need to create another directory for image and this image will send to this model for the object detection here we import an image this basically this image is captured from a mobile and we can assign image path variable here and we can copy this path and paste it here all right now we need to assign some variable basically this variable we need to pass through the model for the prediction or detection detecting the object basically grounding Dyno model detect the object based on the text prompt so here the test prompt is Mouse and we need to assign observe the Box threshold and the text threshold now we need to basically load the image from the image path and then we need to send this image to the model basically this model will return boxes bounding boxes of the detected object and the accuracy and the object name object name is basically the test prompt it will return if it is found it will return the text prompt as a more object name so here is model we can change this name Ma to my grounding Dyna model and then the image will send to the model for prediction or detection detecting the object then basically you can since it to my image then it looks good the caption is the text prompt and the Box threshold is the Box threshold and the text threshold all of this parameter we need to assign to predict the bounding boxes here you can see maybe there is some issues inside the image fast we can check it's working or not if we get the print result then it's okay maybe there is some issue which let's say yeah uh we are getting error so you can basically load image will return to value that's why it was uh showing some issues there hopefully it will be okay this pretty function will predict the bounding boxes of the mouse from this image so let's check we're getting torch not compiled with Cuda enabled except this is the big problem all of you are facing this problem to run this rounding Diner so here I will show you a simple way to solve this problem here you can see that it is directory and inside this you can see a inference.pi here you need to find the predict function inside the 3D function you can see here the device is Huda that's where you are basically getting this error torso compiled with Cuda enabled just we need to add your running on in CPU there is no Cuda and node GPU is here so we need to change this Coda to CPU uh let's say again yes it says to CPU the device device should be CPU because we are running in my PC and here is no GPU let's check hopefully it will be solved and detect the mouse from the image we yeah we got it we got two bounding boxes here and two mouse two Mouse's and these two predictions percentage now basically we need to annotate the image first then we need to display for display means we need to use supervision so importation as SV just uh we can write here SB Dot Plot image so basically we need to plot annotated image otherwise we don't understand so before display we need to annotate uh this already imported so annotated image annotate where you can see the image source uh we can find it from the load image yes just copy it and paste it here and then the boxes in the boxes we can [Music] we can send deleted boxes it seems looks good and it will not conflict then logic you want to accuracy and for checking generator is working or not we can just print the entered emails dot shape uh there is some issues we need to we miss here [Music] and object name basically it is the object name the detected object name so let's say hopefully it will work yeah we get the image shape that means annotation is successful then we need to display using the supervision so you need to put it here the annotated image and the size let's check basically detect Mouse yeah we got it here you can see two mouse on the table uh address is very good 82 percent and here another mouse with accuracy 76 percent so in this way basically grounding dialogue detect the object uh based on the text prompt uh now we can check the bounding boxes accuracy and object name individually uh actually basically this three parameter returns from the predict now you can check yeah we got two bounding boxes accuracy and the object name so it's working very fine basically grounding Dyno is very fantastic objectation algorithm now you can check the wallet yeah we got it 58 percent accuracy grounding Dyno is a zero short object detection algorithm we can detect multiple object based on the test prompt wallet Mouse and tissue box let's check yes we got it it's working very nicely it's really amazing tissue boxes with 68 percent accuracy so in this way we can change different things and we can detect based on the text prompt and I can add the parts on and also you can add the water part and and you can add the keyboard let's check wow that's great it detect the water part keyboard two keyboard is here you can see and here's the water part person successfully detected all of this object based on this text prompt we can use language like text like person with laptop then what will happen it will return basically two bounding boxes I think yeah a person with laptop and the person two bounding boxes so it's really fantastic in this way we can detect a person with laptop now we can detect coin on the table basically inside the grounding Dyno the clip language model is working here you can see the table and the coins on the table really really amazing uh now we can check only the coins then what will happen nice it detect all of these three coins very nicely now I will detect floor from this image and it will return the bounding box ready fantastic we got the bounding boxes of the floor now you can sense it depies then what will happen then then I think it will detect separate times from the fruit yeah we got two ties to separate ties and two separate bounding boxes for the styles uh actually I think we never see this type of detection algorithm before now we'll check power cable on the table let's check wow on the table we see the power cables and it detect nicely and really fantastic now we can check the pane on the table actually I put it lot of things in the table for the detections nicely detect pain with 74 percent accuracy and here is the surprising matter another pin is detected under the keyboard and with 51 percent accuracy it's very nice now I'll show you uh in another way from the list we can detect object for that we need to create another file basically we will detect from the list just you need to copy this file to another file and we will load model in different way I'll show you it will detect object from the list python List It candidate object for that we need to Glow we need to go roboflow grounding you know GitHub from here we need to copy some paste here you need to just click on this link animated data set yes and here from here I will copy some lines here you will also face on the difficulties and I'll show you how we can overcome these difficulties so we need we need to load the model fast uh previously we use load model function but here we'll use another command just uh model equal to model and we need to also import this function first we need to copy this model config path in the previous code and then need to delete this because we want to keep the model name my GD model as previous and here we need to change this model as you can see error just we need to import this model then yeah it's gone some little bit modification we need to learn here and we can copy it basically we will send this type of list to the model previously we detect from the text prompt like this and now we'll send a list basically python list classes name like Mouse wallet and tissue box suppose this is a list list of this is the list of classes so we'll send this class to the model so we need to read emails here using CB2 Dot embrit then we need to import here CB2 and basically we need to delete it and need to copy the detection detect object Discord from this row of flow GitHub paste it here and we need to delete the previous one and delete it in the previous in the previous file we use my image here we also use my image and just copy there is a test threshold to copy this because inside the direction you will see the class enhance class name this function insert the detection basically uh in the list it will add all like all mouse or tissue boxes like this so here you can see the classes this class is actually will send this function in hence class name and it will return with some additional text here so the detection function is now okay now we will print this detections if it is okay here we use predict but here we use predict with classes basically we are pushing a list of classes and based on these this model will detect the object from here we need to copy this for annotation and we remove it and paste it here so for getting some error here just we need to change the image frame image frame here and another error and you can see the image basically it will be my emails yeah later gone now we can check it out for getting the same error plots not compiled with cooler now we need as we import this model from the grounding Diner we need to send again from the UTS inter inference file here you can see the class model and here you need to change the device as you can see the Cuda as we are running it on CPU that's why we need to change this to CPU as we don't have the CPU here hope it will solve the problem let's check oh nice but we got an error here we got four bounding boxes but we got an error uh here basically this is the big issue on overflow GitHub code we need to send something here basically detection detection returns uh four parameters but here in Rover flow we can see five parameters that's where getting the zero expected type of got four basically it's returned four parameters one two plus ID and three and four four parameters is written but in lower flow code [Music] we see 5 so we need to change here let's check first either it's four or five let's check for ABCD in detections just we will print it here if you got it [Music] let's check level there is no issue getting an error but it's no issue let's check uh hope uh yeah it's working we got separate four rounding boxes and four parameters in each Loop so we need to sense in the row flow code hope it will solve this previous problem the first is the bounding boxes the second is the confidence so here you can write boxes second is the confidence and third is the class ID hope it will solve the issue here you can see in row flow code maybe here is some mistake over there uh let's check it again now to delete this lines and automated let's check hope it will solve the issue yeah we got it uh basically in this time we sent uh a classes list of classes python list uh to the function credit with classes and uh we successfully got four bounding boxes here so in this way basically uh grounding Dyno detect object from the text prompt and from the list of classes you can change this list and take we can add here person water pot and keyboard so let's check actually this is the list of classes python list we send with credit with classes function and it will return the detected bounding boxes from this list uh yeah we got it we got the water part Mouse person keyboard tissue box search really fantastic in this two way we can detect object using rounding dyno one is from the text front and another is from the list so it's really fantastic hope you understand clearly

Info

Channel: SILICON VISION

Views: 2,492

Rating: undefined out of 5

Keywords: grounding DINO, computer vision, silicon vision, grounding dino object detection, grounding dino zero shot object detection, object detection, custom object detection, yolo custom object detection, object tracking, object detection and masking, SAM, SAM with Grounding DINO, grounding dino roboflow, grounding dino sam, grounding dino github, grounding dino paper, drounding dino segment anythings, grounding dino explained, grounding dino colab

Id: xbyQy5bbdDg

Channel Id: undefined

Length: 39min 32sec (2372 seconds)

Published: Wed Jun 21 2023