Stable Diffusion Inpainting with Segment Anything Model (SAM)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and welcome to my YouTube channel uh in today's video we are going to take a look at how we can combine segment editing model with the stable diffusion model to create something really really cool so I'm not going into details of how segment anything model works I think a lot of people have already done that and you can also read the paper if you want to but this is a really cool model the best part is it's open source and the model rates are available and data is also available to the user so you can you can build really a lot of cool things uh using this and it seems like the model is quite good um so let's give it a try and see how we can combine it with segment uh segment dancing model with some with stable diffusion model and for today's video we are going to be using the GitHub repository for sigmundancy model so you must be you must take a look there and uh here is the GitHub repository so here you can see instructions on how to install it so that's that's what I did I installed it this way I already had all these dependencies installed so you can if you don't have it just install it and then I downloaded the weights vit H sap model um so I'm going to be using this library and then I'm going to be using stable diffusion from hugging phase diffusers and I'll be using radio to build the UI so let's see let's get started so as you can see I have created a workspace and I have already downloaded the weights so I'll be using vit it's the largest one and um I did this a couple of hours ago and I really liked it so I thought I will share it with you so I do have a reference using which I will be writing some code but it's it's quite easy to do it yourself too so let's start by creating a file called app.pi and we will import some of the stuff that we would need so let's import radio so we have our Imports in place and these are the libraries that we will be using today so I can just write here uh Sam checkpoint so this is the path to the weights so it's in my weights directory and I think I can just copy the name from here copy path so I can just copy the relative part okay so I have the weights here now I have to define the model type and this comes directly from the example in the git repo of Sam so okay underscore h here I can also Define the device that I will be using which is Cuda and then I have the same model itself so to use that you have to call Sam model registry and then the type of the model type [Music] and then you have to define the checkpoint so checkpoint is some checkpoint [Music] so now you have the Sam model and then we send model 2 device just like we do for any PI torch model the second model that we need and yeah we have to initiate the predictor class which is Sam predictor and here you have the model okay and the next thing that we need today is the stable diffusion in painting pipeline so I can just say pipe go to stable diffusion painting pipeline from free trained and here I have the model name so from stability AI and stable diffusion tube in painting okay and I can have the torch D type so I can use plot 16 save some memory okay so we got the pipeline for stable diffusion and we can also send it to uh Cuda okay so we got the basic components up and running if you have if you don't have downloaded this model before uh hugging face Hub is going to download it for you and now what we are going to build we are going to build a simple top demo in which we input an image we select some parts of the image and then we can replace that part with some given some prompt we can also replace the background instead of the object so we initialize um a radio block with gr Dot blocks as demo and then we have a bunch of row so let's say we have with gr dot row here we have um what did I have input image input image which is we can say gr dot image and then there is some label input we have the mask image and the same thing I should use GitHub copilot for this so this will be called mask and output image and this will be the same again and it will be called output so I got one row in which I have these three uh image components and then I get another row and here I can have prompt text so what kind of prompt I want to use this is gr Dot text box lines equal to one the single line you can do multiple lines and if you want and label equal to prompt so I'm just going to use the positive prompt or the prompt not the negative prompt but you can you can have as many ROMs as you want so now we add one more row with gr dot row and this row will have a submit button submit equal to submit okay so we got all these three text Fields uh sorry these three image Fields text box a submit button okay so how the Sam predictor works is you have to supply it with a bunch of pixel values sorry the coordinate values for our image X and Y coordinates and one label so if you if you supply like if I click on a point in the image and it generates a mask for me and if I think it's not good enough then I can give it more points with the same label so that's that's what I'm going to do be doing today um but before that I will write a simple function inside this called in paint so the end pane function will take an image it will take a mask and a prompt and we are using stable diffusion 2.1 uh I'm going to convert the image to a pil image image Dot from array so these are numpy arrays and here I can write image and I can do the same thing with the mask image so oh shouldn't have done that okay um mask and mask image so I got the two mask images sorry one image and one mask and now we need to resize it to five one and two cross five one two so we made equal to image dot resize Pi 1 2 comma Pi one two and mask equal to mask Dot resize y12 comma five one two and my output will be the pipeline so here also I'm using the default parameters so prompt is prompt um images image and mask image is mask so I'm not doing anything um interesting here I'm just using it from the example and here I can save dot images so I get to get the all the images and I only need the first one very simple and here I can return output so now I have I had a few buttons here I had the submit button so how I want to build this is if someone clicks on the input image anywhere on the input image it a mask should be created and then the mask should be used for impending so gradio has just it's about to include a feature it's still in pull requests so I installed radio from Source but now you can click on an image to um and get the pixel coordinates so here I can write input underscore image dot select so this will be a select event and I will use a function called generate mask and this function will take the input image as the input and provide output and mask ing so just generate mask is something that we have not written yet so we can just write the function here it will take input image and an event so I'll just call it EVT which is gradual select data type okay so now we have the mask and here we can just write what I'm going to do is in the selected pixels is going to be stored in EVT so I'm not coding in a very great way today but this works selected pixels I make it a list and here I can say uh selected selected pixels dot append EVT dot index so this will store the current selected pixel and once you have the selected pixel we can say for the Sam predictor set the image as image which is the original image input by the user and then create an input point or points so let's just call it input points which is an array of coordinates X Y coordinates and B dot array selected pixels and input label is we will say like everything is programmed there are only two labels so NP dot ones input points dot shape zero so as many points as you have as many levels you have and then we can just say okay the mask is we can just call the predictor dot predict so this is Sam and inside that we have uh Point coordinates Point coordinates is input points point labels is input label input labels and you can have multiple masses output but I'm going to simplify it I just need one mask and it will return a mask in the shape of 1 comma size comma size okay and if you have multiple masks you will have multiple here okay so now I have to convert mask to your pil image so image Dot from array mask and here I will use the first one okay so the mask has now been converted into pil image and now I can return the mask okay so uh what's happening is whenever someone clicks on the input image it sends the point to select the pixels and using select pixels we construct input points and input labels and which are then used along with the image to create a mask from Sam and mask is displaced in mask image so now we also need the in painting thing so I can just do submit Dot click and here I have in paint the end paint function which has inputs input image mask image and prompt text and the output is output image okay so far so good uh and now to launch the demo if name you can just launch the radio demo using demo.launch okay so a quick walkthrough we have the input image mask image output image and prompt text and submit button whenever someone clicks on any point at input image it gets recorded in selected pixels which in turn creates input points array inputs labels is the same all are ones and then we use Sam predictor to predictor mask using the image and input points and that mask is returned the mask will in turn is being used to for impending using stable diffusion and the again phase diffusers diffusers Library so let's try to see if it works I'll just forward this port and here I can say python app dot by our radio app dot pi so it will take a few seconds to start and then we can go to the Firefox and see if we have something there so I got a error saying radio has no attribute block obviously it should be blocks but here we are not using blocks we will be using row so the angle that we call this error and okay let's launch it using radio if you launch it using the radio command whenever you like uh refresh it uh whenever you make changes in the code this will start automatically restart automatically okay so for me all the files were already downloaded so let's go and see if we are getting something in the web browser so the app is working and what we can do is let me upload an image so image of a girl and if I click somewhere I hope you can see the cursor if not I'm clicking on somewhere on the jacket and it should generate the mask so it generated the mask and if I want to enhance it I can keep clicking on the jacket more so if it doesn't find some part on the right side I will click there and The Mask will change now I can write the prompt girl wearing red jacket and click on submit and now using this mask that has been created by uh Sam model and stable diffusion in painting model we have we have a new different image so yeah it's pretty it's pretty cool so like if I want to uh select more I can so now I have selected like everything by clicking at different points and let's see if I can change this and if it still works uh okay so yeah something bad happened because see now I have for some reason selected the face and I shouldn't have selected that but that's okay um but you you get the idea and now I want to show you something else that you can do with it so let's let's go back to our vs code and here what we can change uh as we we can say like the images the background so I can just say um here mask is a binary array so I can say NP Dot logical not mask okay so now it will select everything in the foreground and the background will change which is also I think it's it's quite cool uh so let's see how that works so I think this has uh loaded again and uh here is my app let me add the same picture and now I will click on the image and you see the foreground is black previously the background was black so now I've selected the girl in the picture the object and a girl on a beach let's submit and see what happens so they are pretty cool right and you can click on keep clicking on submit and it will uh give you different results each time and you can choose the one you want so now now you can take your own picture and change keep changing the background so yeah that's uh I think I think this is stuck now okay now here it is so uh yeah that's it for today's video I hope you liked it and uh do click on the like button do subscribe and share the video with your friends goodbye everyone [Music]
Info
Channel: Abhishek Thakur
Views: 25,779
Rating: undefined out of 5
Keywords: machine learning, deep learning, artificial intelligence, kaggle, abhishek thakur
Id: CERvlvUvVEI
Channel Id: undefined
Length: 23min 8sec (1388 seconds)
Published: Mon Apr 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.