ComfyUI: IPAdapter v2 Basics | Stable Diffusion | German | English Subtitles

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello and welcome to this video, in which I would like to exchange some lifetime against knowledge. There was a huge IP Adapter Plus update and there is a downside, but also an upside of the medal. The downside is that the old workflows no longer work. It was a breaking change. We have to rebuild everything. The good side is that we now get almost infinitely many options how we can integrate the IP Adapter Plus into our workflows and get different variants of images. I will probably split the next videos a bit. Today we take the IP Adapter Basics, then I have planned to show FaceID, to show the Embedding or the variant with the Embeds. I intend to show the Tiled IP Adapter and the Badged variants. The following videos will probably be a bit shorter, but in this video I will show you how the technology works with the IP Adapter and how to use it. For installation, download it via the manager, if you don't have it yet. Otherwise you will also get the update via the manager. It is important to know in advance with all the models. The IP Adapter Models that can be downloaded here have the correct name here. They come into your IP Adapter Model Folder in the ComfyUI. The FaceID variants are the same. The FaceID LoRAs of course come into your LoRA folder here. I still have them splitted with me. You can pack them in there as you want. That works then. The names are already correct here. I would leave it that way. But what is a bit different is the Clip VitH and BigG variants for ClipVision. If you want to download them, they are both called Model Safe Tensors. That's why I recommend here or it should be done. Copy this name out here once. So control C, then click on download, control V, enter it down here and then save it in your ClipVision folder. Not Clip, but ClipVision. Over here is the Clip folder, but it comes in the ClipVision folder. And then the whole thing should work. Why? We'll get to that in a moment. And with that we jump directly into the ComfyUI. I load a basic workflow, prepare a few things here to show you the application of the IP adapter. So I'll take a seat of 0 here. Say fixed here. CFG I turn down a bit. The application of the IP Adapter is now basically pretty simple. We still have our IP Adapter folder. We can choose an IP Adapter here. The whole thing he needs is Model. Well, we have that down here. We'll put that in there. He gives out a model. Okay, we can put the sampler in there as an IP Adapter. What do we think of this? Aha, we have the IP Adapter Model Loader once. We don't take it here, but we take the Unified Loader. And here we can already see that he also has a Model Input. Then we hang them around like this. And of course we still need a reference image for the IP Adapter. Let's say Load Image. Here I take the Women with Sunglasses. And that's basically it. I still want the Epic Realism SD 1.5 Model for the demos here. Press Q Prompt. I deliberately left Prompt like that. So the thing rattles off. And we get an IP Adapter styled image. So here in the Unified Loader we now have the opportunity to automatically let us pick out the right models. Based on a better readable variant for us. Face Models we only do in the next video. When it comes to Face ID. But we're here right now on Light with a Low Strength. We can take Medium Strength. We can take VG for a Medium Strength. And we can take Plus for a Medium Strength. And the IP Adapter Unified Loader takes care of which models we need for this. So that means if we now have Light SD 1.5 Only Low Strength selected here. Then he also takes care of loading the Clip Vision Models. If we take VG Medium Strength, for example. We would have to select the VitG Model in the old variant. But that happens automatically here. So I'll take Plus High Strength for example. Let's load that again. And we get our IP Adapter stylized image. Interesting to know is that we can now also just take an SDXL model here on the fly. For example, I'll take Protheus. I say down here I want 1024x1024 in Latent. Press Start. And the whole thing still works. So we don't have to worry about loading the right models anymore. But the Unified Loader takes care of that for us. So here we go. Of course a bit slow, double the size of the image calculation. But we already see SDXL model selected. Otherwise nothing has changed, except of course here our SDXL dimensions. But we now get the correctly used IPA Models as well as the correctly used Clip Vision Model directly from the Unified Loader. But for this video I go back to 1.5. Because with that we can just generate more pictures on the fly. I'll let you load again and show you what the Unified Loader can do otherwise. I'll copy the sampler once, make a second one out of it. And choose a Unified Loader for us. We can chain this Unified Loader, namely Daisy. And then both of them take care of the chain that models are not loaded twice. So we need a little more space here, because I need an additional IP Adapter. So we now drag the IP Adapter in here. But we drag the model in from down here. So not from up here, but from down there. And continue to pack this through. Now we can say here, for example, we want to use the standard medium strength here. Of course we still need our reference picture. And we need our model. So what we have at this point is that our loader also gets our base model plugged in down here. But we have chained the IP Adapter. So we use a Daisy Chain. And that is very friendly for our RAM, because we now only have to load the Clip G model once. And we now see the differences between the two variants. So I'll clean up here again. We don't need all that. I think this is still correctly set. Yes, it will be much more interesting if we throw away the IP Adapter here. And use the IP Adapter Advanced for this. So here, too, we connect everything. IP Adapter to IP Adapter, model to model. We can already send the model to the sampler. We need our reference image in here. And that should work just as well as before. It will be regenerated again, but nothing will change in the picture. So we can now manipulate the weight types here. If, for example, I take Reverse In Out here and adjust the prompt a bit and say woman at the seaside, sunset, golden hour. Of course also the negative a bit. Let the whole thing run. Then we see, for example, that with Reverse In Out we get a lot more influence from the prompt and not from the image. That means we can also go a little higher with the weight here. And then get pretty cool combinations between our input image and our prompt. We also have different other easing functions here. Ease In, for example, means that the IP Adapter intervenes strongly in the unit at the beginning. While Ease Out is of course the opposite. Here the IP Adapter intervenes more towards the end into the unit. And as you can see, we get very interesting different pictures here. I also had a workflow built for this. So if I just let it run. This is a similar setting to what we just had. It would have to start quickly because the model would have to be loaded anyway. Then the samplers start. We can let them run. In the end we get a comparison of the same reference image and the same prompting. Just by changing the weight types here. What kind of influence does this have on the image generation? We can already see it here. They are all similar, but also different. So the last one runs through. Let's take a look at the result. And here we can already see that we can tear pretty good things by just adjusting the easing type. What we can also do is if we take another second picture. Where is it? Where is it? Where is it? This picture and a second IP Adapter and a second loader and connect all the stories together. Then we can say here, for example, we want to have a standard strength. We want to have this reference picture for it. We drag the model down here. And we let the whole thing run through. Now we get a mixture of both. From both pictures. But we can say here, for example, we want to have a different strength, of course. But with the regulation of pure strength to pictures, we focus on the embeds. But much more interesting is that we can say here, for example, we want to have a strong middle and here a weak middle. And the whole thing then combines completely differently. For example, if we take an ease in here and leave the strong middle here. Then we get completely different pictures here again. Here we say reverse in out and we just said we can turn it up a bit. I'll see that our prompt comes through a little better. If we say here, for example, wearing a white dress. It changes a bit. We go down here a bit with the weight. And there our white dress slowly comes out. The background is still a bit taken from here. But the combination options are pretty cool here. Of course, we can still say start at and end at for the different IP adapters here. What we can also do is where we are already at the moment. Of course, we can continue to use the attention masks. That means if we now say we want to have only half of a picture. So and I take a mask blur because we will need it. We will need an invert mask. So that means we say now we want to apply the attention mask originally to the first one and the attention mask turned over to the second. Of course, we can continue to work with the masks. I better go back to the strengths 1 and 1 so that we can see something better. Put everything back here on linearity. So we would have to get a very good mixture. And here you can see quite well that our picture is now divided at this point here. So another point is, you may have seen that we no longer have the option or that there is no more noise option. The noise option has also moved elsewhere, and we now have an extra IP adapter noise node here. And what it does, I can show you briefly. It won't surprise us very much, it generates noise in different variants. We can choose between Fade, Dissolve, Gaussian and Shuffle here. Shuffle is the variant that was used as a float input field in the old IP adapter. So if you want to go back there, that's shuffle. Otherwise you could play around a bit here. We can now also apply a bit of blur here. We can reduce the strength. You can't talk about strength anymore, but we can adjust the strength. And that's how we get different noises. This is to ensure that the IP adapter, which was actually trained in the negative area with black pictures, can be passed on a noise latent here. Which then allows the generation, i.e. the generation of the pictures, to run a little more controlled. You have seen, we can put this noise here into the image negative. These are the negative embeddings that are influenced by this. And that helps with the noise in the unit. So what do we have here? Our picture is now more or less ignored here. Because we have plus high strength here. We have shuffle here. I'm just wondering a little bit why this is being ignored right now. Ah, of course. Small mistake. We didn't put the attention mask in here. But that was because of the deletion of the nodes. Of course, we are not allowed to have an attention mask in here. Then we get our influences again. Sorry, I just mumbled a little bit. But because we have deleted the whole mask construct, it automatically connects to UI. Of course, we got an empty mask. That means the IPA was not used for anything. And so we completely ignored it in the generation of the image. So I'll put the noise back in. And you have seen, we got a sharper picture. What can also help is if we put our picture in the image optional here. Then we see that we have a noise here. Or you can see it a little better in another noise generator. If we switch to Gaussian, for example. Then we see here that our original picture is filled with noise. And we get different results. And here we can of course also say, we want to take out the blur. We want to have everything a little sharper. We take influence on the generation of the picture. Or we want a little less influence. This noise on the generation of the picture. With that we can get a lot of variants out of the whole thing. So what still works is, if we take a second load image node again. We choose this one again. We say, I always give it wrong, an image batch. So we can of course also mix several pictures together again. Which push into the positive here. And then let the whole thing run. Then we get a mixture of both pictures credentialed. Here we have our woman with a dirty face. But we have the earrings from the other picture. Back here comes a little bit. Yes, that is due to the difference before. Before that, of course, we only use one IPA. To encode the whole thing. And here we can now play around with the combine embeds. Usually they are concatenated. But we can also add them, subtract them, average and normalize average. And normalize average is always a very good value. Which can achieve very nice results. Add and subtract is more of a bit of play. But you can still try it out to see which pictures you want to generate. That depends on it. Subtract is pretty cool right now. Now we can go back here and change the weight type a bit. We now say reverse in out. Then turn the weight up a bit again. So that our prompt might get through. That's not so nice. Let's say ease in. That means we use the IPA strongly at the beginning. And back to back less. No, strong middle maybe. No, it doesn't do that much either. Let's stay linear. But let's say we average the whole thing. And then we get an average again. That's what I meant. We can do an incredible amount of combinations. Let's take a look at the combination possibilities of an IPA. For example, I would like to intervene 25% in the image generation. If you now imagine hanging a second one behind it with another model, another weight type and possibly other embeds that are then combined. So the possibilities here are many. No, I won't find anything like that. I just wanted to clamp a noise node in between. So let's take this for example. We take that as negative. Put a little blur on it. Say shuffle as it was before. Yes, and have a little more depth, a little more weight. Here it is still on average. Concrete with noise. Yes, you can mix it like crazy. So yes, that's how you basically turn the IPA, i.e. the basic variant or the basic variant on now. There are many, many possibilities. I just want to come up with a few more tricks. What you shouldn't do. We know we can chain unified loader with each other here. But what can also happen here is that we just have a model loader, an IPA model loader. You can also hang it in here, but you also have to use ClipVision. You can use it in the IPA itself. You can't use it in a unified loader. If I let this all run now, here with some 1.5 model, then it's broken. Please tell me. Inside face. OK, so it doesn't work. So this IPA Adapter Input. That's right, this IP Adapter Input here at the point in the unified loader is only there to chain unified loaders. What you can also do is hang a ClipVision loader down here with it, although you have a unified loader here. Technically this works. It doesn't cause an error either. However, what happens in the background is that ClipVision is loaded twice. Once in the unified loader, once down here. Primarily, I think this one is taken, but this one loaded it for free. Therefore, if you hang the unified loader in the IP adapter, don't use ClipVision. The unified loader takes care of that. If you want to load an IP adapter manually over this model loader, then you would have to load a ClipVision loader down here again. We take the IP Adapter 1.5 model. Hang that right in here. That's basically the same. Just that we can choose the combinations ourselves. And then it works as usual. Just so you know. It may be a bit confusing that you have the opportunity to chain the things, but it is not the desired case. So we build again like just now. So this here. Oops. Wrong. Don't do that here. Bad. Makes bang. Well, try it yourself. Play around a bit. See what great combinations you can come up with. And yes, I think we'll see you again in the next video. I think Face ID will be the next one. And then let's see if it's the embeds. Whether it's the shared IP adapter. Whether it's the batched variants. Let's see. I also try to get the videos out as quickly as possible. So we'll see you in the next video. I hope it was good and bye.
Info
Channel: A Latent Place
Views: 1,561
Rating: undefined out of 5
Keywords: ComfyUI, Stable Diffusion, AI, Artificial Intelligence, KI, Künstliche Intelligenz, Image Generation, Bildgenerierung, LoRA, Textual Inversion, Control Net, Upscaling, Custom Nodes, Tutorial, How to, Prompting, IPAdapter plus, IPAdapter
Id: vD30k13HtVE
Channel Id: undefined
Length: 22min 38sec (1358 seconds)
Published: Tue Mar 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.