ComfyUI: SAM | Stable Diffusion | Deutsch | Englische Untertitel

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hello and welcome to this video in which I would like to exchange some life time against knowledge again. Today we are looking at the SAM model. This is the Segment Anything Model. And if you are interested, this is the page for it. The whole thing was developed by Meta AI and Meta, in one way or another, is probably already behind it. And what it should do is, it says here already, with this AI you can see segments of an image from a picture via text input or other input options that we will see in a moment. And they mask them and give us the opportunity to exchange parts of a picture and that with pure text input. If you are interested in this, I will put the link in the description for you. There are demos on the site, but there are also scientific documents, etc. This is the official site and we will take a look at that today. The SAM, the Segment Anything Model. We do the whole thing with the help of the ComfyUI Impact Pack. I have already made a video about how to install it. Of course, I will also put the link in the description for you. But for the installation of this pack, take a look at my Impact Pack video. And I've been playing around a bit with SAM lately, because I haven't dealt with it myself before. But it has interested me a lot and I think I can show you a workflow today that is already quite good. But I have the assumption that there are also other workflows. But to get in and actually achieve quite good results, we will do that here in the course of this video. With the Impact Pack on the site, the GitHub site here, there is something hidden below. There are tutorials here and I had to fight my way there first. So not through the tutorials themselves, but to find it. And that is here in the ClipSeg Tutorial. Because we will need this ClipSeg, ClipSegmentation. There is a few more information at the very bottom. And here it is indicated that you need the Comfy UI ClipSeg Custom Node as a prerequisite. This is not included in the Impact Pack, we have to install it extra. And that is a prerequisite because the ClipSeg Detector Provider Node in the Impact Pack is a wrapper for this node. That means the node we need uses functionalities of the other nodes. And if you click on it here, of course I'll put you in the description too. Then you come to the GitHub page of this said node. And to download them, on the one hand, you can, of course, as usual, make a clone via Git Repositories. But I have now taken the easy way for this variant. Just click on Custom Nodes up here on the folder. Then you can already see the ClipSeg.py. That's the Custom Node, click on it. Then code appears. But you have to press Raw up here on the right. And then the file with the Python code is opened. Here you can then save. And to save them, go to your ComfyUI Windows Portable Folder, ComfyUI Custom Nodes. I already have them here. If you look down here, you can see that the name is pretty unnecessary. Therefore, go ahead, delete everything up to ClipSeg.py and then save the file. I already have it here. I'm going to cancel now. But then you have this node. So back again. Zack, there we are. Exactly. Then I would say, let's dive into the pleasure. I still use the Tiny Terra Nodes Pack because it's just super comfortable. You could actually say it's my favorite pack. Because we can always build a super easy base setup by simply loading these two notes, connecting them to this pipe and done. There is also a video about it from me. Feel free to check it out. If you visit this channel more often, you know that I prefer to work with this note pack here. Well, that's our base setup. Now let's make a few adjustments. I take the RevAnimated Model for this. We've already passed it. I'll put a link in the description and I'll probably introduce you to an extra video. As VAE we take the Stable Diffusion VAE. ClipSkip is recommended for this model. Minus 1 also delivers good results, but minus 2 is recommended. And as height we say 768 pixels. Then we have a portrait picture, but the pictures are still small so that we don't have to wait so long for rendering in the frame of the tutorial. Over here I would like to say save the pictures. I always have to think of my thumbnails. And that's it. Now we can enter our prompts. I'll do a woman in a pink shirt and yellow trousers. And as negative I take Easy Negative as embedding. As textual inversion. There's already a video about that. And in the frame of the video I would like to have these two terms. Because it has already happened that YouTube-confirmed pictures have not been rendered here, which I had to blur away afterwards. And I just want to prevent that. Okay, now we're allowed to see something. What happened? Do I have another one here somewhere? No, I've still randomized. He didn't do yellow trousers. I would have liked to have yellow trousers. Thank you. So that's what the RevAnimated spit out for us. As we wanted it. A pink shirt and yellow trousers. And yes, it's a fashion criminal. I'm with you, but that's how it's meant to be. Okay, so that we can use our SAM now, we need a few nodes. We need the SAM Loader from the Impact Pack. We don't have to do anything with the node anymore. It's already set up as it should be. We still need the ClipSec Detector Provider from the Impact Pack under Util. This is the node that uses the other node that we downloaded as the basis. We also need the B-Box Detector 6 from the Impact Pack Detector. And we need the SAM Detector Combined from the Impact Pack Detector. And these are the nodes that we need first. What we're going to do here is we're going to push it over here. And connect the things. Normally, in principle, there is only one way to connect all the things. And the SAM model has to go into the SAM Detector node. Then we have the B-Box Detector here. It has to go into the B-Box Detector SEGS. We have an image up here and we have an image here. And we still have, as we can see, the SEGS. They have to go in here. So now let's try to sort the whole thing out a bit nicely. We'll push that over here. It's a bit unfavorable here that you can't see it that way. But we actually need it from here. We need our image. So. But we can still make it a bit more tidy here. As I always say, the eye renders with. Okay. Now we have the nodes that we need for SAM added. Which I don't think is unimportant here. I played around with the values, but couldn't see any direct results or changes yet. But here I think you should put it on Mask Point B-Box instead of Center One. So that it works better, because we get our information from the B-Box. Okay. Now we need another sampler. Or even a loader. And. No, I'll add a fresh loader. We'll take one from the TinyTerra Nodes Pack again. We'll pack it up here now. Then we have to clean up the ones down there a bit. I'll do it like this so that they start together. So. And we'll take another sampler. But we can copy it. We can also connect it. And what is actually very important here. I was just thinking about whether important is the right word. But actually it is. Because we have to select a model here at this point. And in front we had the RevAnimated. This may not work so well. Because we're going into the Inpainting area right away. And there it is better to use special Inpainting models. The RevAnimated. Here we are on the Civet AI side of it. Fortunately, it has that. There is an Inpainting variant of the RevAnimated. And you can find it up here. You click on the V121 Imp up here. And yes, all pictures are blurred away. Whatever. But that's the Inpainting model for it. So download it. And put it in the Models folder. So checkpoints folder. In the ComfyUI. Models. Checkpoints. Put that in here. And press Refresh once. And then you can choose the RevAnimated Inpainting model here. The models are really specially trained to work in the Inpainting area. Well, let's take the same VAE as before. We also say ClipSkip2 here. Because in principle it is the same model. And what we have to do here at this point is. We see we get down here as the very last node. We get a mask out. And we can look at it in between. We say Mask to Image. Then we can look at it here. We also say Preview Image. It can be that it sometimes works better. Sometimes a little worse. It's always good as a reference to see in between. What comes out of this node. I just said we are in the Inpainting area. And what we need there. Is from the ComfyUI itself. Is that under Latent? Then I have to look again myself. Or is that under Advanced or even for testing? But of course it is under Latent, Inpaint and VAE Encode for Inpainting. And we have to put the whole story in between. We have the following inputs at this point. We have a picture. We take the picture from the front. Because we need our generated picture. We just want to change aspects of it. We need a VAE. We can easily take it out of here. And we need a mask. And that's what the node down there gives us. Now I have to put the whole story a little further to the right. So you can see it better. And here we pack the Latent. Then into our sampler node. And with that we should be able to exchange certain areas on the basis of text in our picture here at the front. I'll put that on fix. We will now receive a new picture. But so that we can just try a little bit. What we still have to do here is. We want to change an aspect of this picture. And I would say the pink shirt does not necessarily fit the yellow pants. So we'll get another picture right away. Because of the seed behavior of ComfyUI. But that won't do anything. That means we want to exchange the shirt. That's why we go down here in the VAE box detector. Where text is, we enter shirt. And in the second sampler. Here, by the way, we can also take the same embeddings. Or take negative embeddings as in the front of our first sampler. Of course, it would be a good idea to store it in text notes. But we'll leave that as it is. And here in the front. Since we are now only in the area of the t-shirt with this sampler. We say we want a green shirt. A green shirt. And thumbs are pressed. If we let that rattle now. We should actually be able to create a different colored t-shirt than the one from the original picture. The VAE box detector is still loading. Here loads sam model. The model is just loaded. Now it goes on. We can already see here at the bottom of the mask. That he recognized the shirt. And tada! We have exchanged the t-shirt. So was our original picture. So is the picture now. And you can clearly see that only the area of the t-shirt has been exchanged in this picture. We exchange the whole thing based on the possibility that we can say here. So we can say textually what we want to exchange. We can also say we want to give her a blue pants. Then we say here. Trousers. And give up here a blue trousers. That should now go faster with the second pass. By the way, this is what the mask looks like. Also with the shirt. The white is the edges of the shirt that he recognized. The black remains as it is. So our second sampler up there only renders the white area new. Yes. There is not much to say, right? The lady is wearing blue trousers now. Let's look again at the comparison to the picture before. Green trousers. Blue trousers. The style of the trousers has also changed. It will be the whole picture. So we can see here. The style of the trousers has also changed. It will be the whole area rendered new here. So everything that is marked white down here. We see the trousers are marked white. That's all being recalculated. That's why we're shooting back in the head right now. That you might even be able to keep up with the style of the trousers. If you put a canny control net or something on it. What generates outlines. I'm already back to try it out. But we don't do that now. Okay. It also works with other things. If you go in here, for example, and say. A dog playing in the yard. Not Yard. Yard. So. And down here you say. But ah, stupid. Was not a dog after all. So I want to select the dog. And then you say. But I actually meant a fox. So. Then you can run the whole thing again. We go here on randomize. Must then. No, he does it. Okay. Then we get our here. Picture of the dog playing in the backyard. The bbox detector rattles again. But there it goes. We have a dog-like mask down here. And all of a sudden we have a fox. Who plays in the backyard. Let's do the whole thing with a cat. New picture. New dog. Graph Animated just makes such beautiful pictures. It is. I am always surprised. So. I think I'll be the video. Always a little bit. Cutted have. If the emergency down here. So much time. So that we don't have to wait here. But a dog mask. And suddenly we have a cat here. Instead of a dog. Yes. That is the ham. Segment anything model. And. Maybe you want to do that. Here in the area of image-to-image. Apply. And if you want to do that. You can of course go this way here. Load image. I was just brain rfk. However. You can here too. If you load image. Wait a minute. I have to take a quick look. Load image. Yes. This is the standard load image. From the ComfyUI. I go briefly to Pexels. And load us a picture about it. Let's take this here. We pull over here. We pack in there. And done. And. You see. You already have the option up here. To create a mask. And how that works. So basically you would have here. A VAE ENCODE. Somehow here in between. And then you would put the Latent in here. Then you would have this. Image-to-image setup. Down here. You would do 0.5. Denoise or something. 0.35, 0.5, 0.6. Depending on how you need it. But the mask is important here. At the point. We can do it again. Copy this area down here. So. We can see what happens. And you have the option. To right-click on it. And here to say Open in SAM Detector. And if you do that. Then you see the picture. And in this picture. You can set points. And then you press Detect. And that was a bit. Too strong. If you only make one point. Yes, that looks good. He has now taken the background. Probably there are color differences. You have to play around with the confidence. But one was enough. You say Save to Node down here. And now you have. If we let that run here. I just turn off the whole area down here. And this node too. Let's see if it accepts that. Yes. And you see. Now you have already created the mask. In the Image-to-Image area. And are therefore in principle. Down here in the area. Where we created the mask here. It's just turned around. So what you select. Is rendered. In this case, the background. Will be rendered again. But the lady would stay in the picture. That means you could then. Pull the mask over here. Down here into the Inpainting. And then you could use it that way. If you still see a few impurities. Here like these stripes. Up and down. You can right-click. Open in Mask Editor. And then he could be here. With a mask up here too. Make corrections. The Sam Editor comes from the. From the Impact Pack. The Mask Editor. I'm not sure anymore. Could be that he comes from another. Node Collection. But. The possibility exists. Good. Yes, so you can create masks. Throw it into the Inpainting. You can. Use the Sam Model. I'm just thinking about it. If there was anything else. By the way, that's up here. What's on the Meta AI page. As different variants go. You've already seen. You can also. Some input. UIs. Here so. Drag and drop. Pull rectangles. But you can also. Work with these points. You can. Make textual recognitions. That's already. Pretty cool stuff. Good. I'll leave it that way. So that I can. As a Workflow. On Pastebin. Can throw up. What I'm up here. The. I'll turn it off. So. And I'll save that as Workflow. Save. Yes. Then I wish you. Just a lot of fun. When experimenting. And try it out. When segmenting. All kinds of things. When changing. Etc. You can, for example. Where we are now. With the animals. Create pictures. With an animal on it. Makes several. Sampler runs. Several Sam Models. Behind it. Or a Sam Model. Will be enough for you. For the original picture. And creates the same picture. With dog. Cat. Fox. Ducks. Eagle. Whatever. There. Through the garden. Crawls and flees. And. Yes. I will now also crawl. And flees. I wish you a nice day. And we'll see each other. Hopefully in the next video. Again. Take care. And until then.

Info

Channel: A Latent Place

Views: 1,712

Rating: undefined out of 5

Keywords: ComfyUI, Stable Diffusion, AI, Artificial Intelligence, KI, Künstliche Intelligenz, Image Generation, Bildgenerierung, LoRA, Textual Inversion, Control Net, Upscaling, Custom Nodes, Tutorial, How to, Img2Img, Image to Image, Model to Model, Model2Model, Model Merge, SAM, Segment Anything

Id: bgveIPHMdRM

Channel Id: undefined

Length: 25min 53sec (1553 seconds)

Published: Wed Aug 16 2023