Segment Anything Model (SAM) with Grounding DINO to detect and extract object from the image

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys today I'll show you how grounding Dyno and the segment anything model are same can watch together to extract a specific object from the image based on the user input test here you can see two example images here uh in the first image here you can see a person is sitting on its here and if the user input text is person and the chair then grounding random model will return bounding box information of these two object inside the image and using this boundary was information segment anything model will generate marks for this object and finally we can extract this person in the shear from the original image similarly here you can see another original image and if the user input text is mobile and the cup then grounding Nano model will return boundary box information of this detected object and finally segmentally module or Sam will generate marks for this detected object and using this marks we will extract the detected object mobile and curve from the original image and I will show you step by step implementation how you can extract specific or detected object from the image using this grounding Dyno and the segment anything model so before going to foreign model works here you can see the grounding Dyno model basically this model takes two input one is the original image and another is the user text if the user text is person and the horse and if we send this original image to grounding random model basically running Dyno model will search this horse and the person inside this original image and if it is found inside the image then grounding on the model will return but two bounding box information of this detected object and then we'll send this boundary box information or boundary bus coordinate to the same model or segment anything model then secondary model will generate marks for this detected object here you can see the example if we send this origin image to segment anything model and the point coordinates of the bounding box of two detected object to the second anything model then same model will generate two separate mass of this detected object basically using these two marks uh we can extract the detected object from the image so now we can start step by step implementation in Google Poland if you wish to know detail about the grounding Dyno and the 793 model you can watch my two videos from my channel on grounding Nano and the segment NC model so let's get started now we can go for the implementation how grounding Nano and segment anything model are same can watch together to extract two or more object from the image based on your input text or input classes so first of all we just need to change the runtime from here you can click here and set it to GPU NBA sa my this command basically will check the GPU status here you can see it is 34 GPU is assigned from here basically we will get the current working directory using gate CWD command and store it inside the home variable sorry here you can see the home directory basically you can click here and this is the basically your the current working directory if you install anything or if you download anything then it will show inside here then you need to clone the grounding Diner uh from this link and then we need to move this grounding Dyno directory and need to install all of the things so just execute it will take little bit time you can see here the grounding Dyno directory will be created uh inside my home directory here you can see grounding Dyno directory created now we need to install segment energy model same from this git link also we need to install some important packages like dots storage vision and the supervision so one by one we can execute then you need to install torch vision and the supervision after initial is where we shown uh it may require to restart the runtime so here you can see the restart runtime is required so you just need to click here as we have restart runtime and we have used this variable again that's why you just need to get this current version directory again so you just need to execute this line again now we can move forward so here you can see uh inside the home directory this is my home directory here and inside home directory I have created words directory and inside the OS directory I have downloaded the grounding Dyno module wait from this gate link I have downloaded this model weight of the grounding Dyna and then we need to download the model weight of the segment anything model here you can see uh we all of you know that there are three types of simultaneously model uh one is beat is heavy model here you can see if it is and also another two model with a large model and beat b is the beta model so here inside the words directory uh I have I will download the grounding data model with and the segment everything model which is heavy model weight inside this words directory so you just need to execute this here which directory will be created and inside this word's directory grounding Nano and the similarity model weights will be downloaded so if we refresh from here you can refresh you can see here the words directory is created and inside the words directory drowning Dyno model weights is downloaded so now we'll download the model weight of the sigma energy model if we refresh then we can see here the same bit is heavy model weights downloaded here we just assign the device so definitely we will get the 2.0 and you can see equal to zero to initialize grounding Nano uh basically we need two variables one is module config path and another is the model checkpoint path so first of all we need to assign these two variable grounding Dyno config path and the grounding Nano so from here you just click the copy path and paste inside the model checkpoint path down internal checkpoint path and another is the running Nano config path for getting this config path we just need to click this directory and here you can see another rounding Dyno directory and then config directory and inside config directory you will get to python file so basically this one is 20 this python file is the grounding Nano config file so we just need to copy this path and paste it inside this variable so basically this is the idea how we can initialize the grounding dyno model so we just need to execute basically here I have used opencb to read the image and I will send this image to grounding random model and also I'll send the classes name or I can send the text prompt and based on this classes or text from grounding random model will return the bounding boxes and some other information like the confidence and the plus ID so we need to upload image inside my current working directory and upload this image we can rename this image it will easy for the operations so here we just need to change one here basically I have read the image using opencv and then we change the color from BGR to RGB and we have keep a busier image copy inside the original image and here I have resized all of this image here you can see the original image size was too large that's why I have resized this image to 1024 by 1024 uh now you can plot this image using supervision Basel supervision is used for image visualization also it is used for image annotation so here you can see the image so from this image we can identify this app and mobile so how grounded and will watch let's say basically here running Dyno module will take two major inputs one is the classes name here you can see the mobile and the cup and another is the image so I'll send this image and the class name and the running Dyno model will return a bounding box information if this mobile and present inside the image and also we will return some other information like the confidence and the class ID so we can take it So Random model is working with this image and the class name if this class is present inside the image then it will return the bounding boxes and some other information so here you can see two bounding boxes one bounding boxes and another bounding box is here and also you can see the confidence and the class ID 1 and 0 0 is mobile and one is the cup now we need to extract this boundary box information because we need to send this boundary box information to the segment energy model and segment energy model will generate the marks over detected object here basically I have print detection dot X Y X Y so here you can see detections inside the X Y X I will get two bounding boxes and inside this detected boxes I have kept two bounding box information and inside the class ID I have kept the class ID here is the bounding boxes and type is numpy array and here you can see the class ID now you can annotated the detected object that means mobile and curve using the supervision here you can see scene is the image bezier mutation is the detections skip level is true and here is the level see the class ID so now I can plot this finally detected class 0 is the mobile and class 1 is the up so in this way basically we can draw fast the bounding boxes uh using the supervision and now I will say in this bounding box information to segment anything model for getting the marks of this segmented part these two objects so here is the basically initialization of the same model so same model type is it is and this is the model checkpoint path so basically this is the model sequence sem heat is so from here we can just get this path and here you can see the model type and the checkpoint path is required to initialize the same model uh here here I have assigned a mask Predator variable uh to predict marks based on the boundary box information written by the grounding Dyno model so first I will initialize the same model and then I will send the boundary box information written by the grounding Dyno model to segment anything model and also we need to send the image and based on this information segment and the model will return the max of this segmented object so we need to initialize the same model as two boundary box detected by the grounding Dyno model that's why I have initialized a loop here and inside the loop I have seen one by one this bounding box inversion to the segment anything model here the segment energy model and simulation model will return the marks basically here you can see box information and here you can see the image So based on this image and this box segment everything model will return the marks and then we'll plot this marks later so you can execute this also we have annotated the image based on the masterton by the similarity model so basically this is the code where you can see if you wish to know detail about the second energy model you can check in my another video from my channel basically signature models in three ways one is automatic marks generator in this way basically segmenting model same can return marks for all of the secondary part of the image and another two is one is Max generated by the boundary box information and another is marks generated by the point coordinates so here I have used basically a bound box information as the accounting random model written bounding box information based on the classes name or based on the text prompt so here I have used the bounding box uh to generate marks by Sam model so you can execute this so first of all we can plot the annotated image here you can see image annotation based on these segmented marks generated by the same model here you can see the mark shape so we just need to change this shape to convert this marks into the image so here basically I have used uh basically this code is used to convert all of this generated marks by second anything model to the busier image so how you can do this here you can see I have used the transpose to reshape the marks and then I have used this line and here I have used this line to convert this mask into the number array image and here I have just convert this image from Gray to BCR as the marks generated by the similarity model is 2D binary image so definitely it will be the gray image so we just need to convert it to grade two busier from here you can see the previous mark shapes and the present marked shape that means this Mass converted into 3D busier image first I I can draw all of the marks here here you can see this is the first marks uh here you can see the marks for the top and here you can see the mark for the mobile basically now it convert it into the image basically here I have joined all of the marks using bitwise or operation generated by the sigma anything model these two marks will be joined together using these videos or operation and here you can see uh inside the single image we have joined these two marks together now I will perform Here video is end operation between this segmented image that means the joint marks and the image be here and then we'll plot this image so what will happen uh here you can see we have successfully extracted these two object and all other part of the image is now black here you can see the mobile and here you can see the cup so if we erase all of this black background then we can finally extract so basically here is the code you can see here I have replaced all of this zero pixel using this 255 pixel so definitely a black wizard will be removed here here you can see a successfully extracted in the top and the mobile so here finally you can check all three image side by side here you can see this was the original image and this is uh the annotated image and this is the segmented image and we have successfully started the segmented part of the image based on my input text so this is all about the segmentation model and the grounding Dyno how it can extract the object from the image based on my the input text or input classes uh I can show you for some other image I have uploaded another image another two image and I will show you the result of this two image so now I can read image 2 here you can also perform this operation for your own image you can upload this image and you can execute this code and I will give you this collab notebook inside my video description box so you can use this link to execute for your own image so here you can see the image horse and the person so we just need to send the class name here person here you can of course then successfully we can draw the class ID 0 which is the one so this is the person is zero so we don't need to again initialize the same model we just need to execute this to extract the marks based on the boundary box information written by the ground Indian model so we just need to execute these to get the marks based on the boundary wash information written by the grounding Dyno module here you can see person horse two Max is generated successfully we can annotate this here you can see the marks around here basically uh converted all of this Mass into the bezier image here you can see the shape of the marks and finally convert it into 3D busier image now I can plot this all of the mass here you can see two marks and here basically we have joined these two marks together into the segmented image here you can see uh here we'll perform B twice and operation between these segmented image and the bezier image basically this is the segmented image so we'll get uh this image with black background then we just need to convert this black region into the white pixel so this is the basic idea then we can plot side by side all of this image here you can see successfully we have a extracted the object based on the text from this is the original image this is generated this is the segmented image I rename this image to three and I can show you for image 3 we just need to put it here three we can plot it here you can see the person sitting on on a chair so we can send person and the chair here you just need to change here uh two bounding was detected by the Downing Dyna model and we just keep this boundary pass information inside the related boxes variable and we can plot this annotated image here you can see successfully delete detected the person and the CR here you can see so don't need to initialize again the same model uh we just seen this bounding box written by the say grounding Dyno model to segment anything model so inside this loop I have sent one by one this uh Converse information uh segment anything model and simulinity model will return to the marks for each of the boundary box information so here you can see person and this year and this is the max shape here so we can draw this here you can see successfully annotated by the return marks of segment anything model and we just need to convert these marks into busier image so here's the basically apart from three lines of code one is the transpose and this one inside this Loop so here you can see such is converted this shape into the bezier image we can plot these two Marks here separately here you can see this is the person and this here mask and now if we join this Mass together using video is or operation here you can see this is the image now I will perform a bit wise and operation between original image image busier and the uh segmented image basically this is the segmented image so if we perform this bit wise and operation then uh we'll get this image black background then we just need to remove this black background uh by this line of code and here you can see extracted the image and if we plot this side by side all of this stream is here you can see this is the original image this is the original image and the annotated image and the signaled image so basically in this way you can also extract a specific part of the image based on your input classes or the text by grounding Dyno and the segment anything model so thank you very much if you wish to interest to get my next video please subscribe my channel thank you thank you all
Info
Channel: SILICON VISION
Views: 1,909
Rating: undefined out of 5
Keywords: object detection, zero shot object detection, latest computer vision model, grounding dino, grounding dino model, grounding dino github, segment anything model, SAM model, image masking, image segmentation, image segmentation model, segment anything model github, grounding dino code example, grounding dino in pycharm, detect and extract object from imgae, zero shot computer vision model, image classification, computer vision, opencv, opencv-python, computer vision using python
Id: sQnI_4XjfUQ
Channel Id: undefined
Length: 21min 32sec (1292 seconds)
Published: Wed Aug 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.