ComfyUI: A1111 Prompting in Comfy | Stable Diffusion | Deutsch | Englische Untertitel

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello and welcome to this video How does that fit together? I'll show you now and I've been looking for a prompting node that can do a little more than the standard prompting of ComfyUI and yes it is so that ComfyUI and Automatic 11.11 internally process the prompts differently and if you are now maybe just changing from Automatic 11.11 to ComfyUI, then that will probably help you a bit to keep up with your prompting style because Comfy, as I said, a little different image results come out, even though you use the same seed and the same sampler and everything. That's why I wanted to introduce this node and it also expands the functionality in Comfy for a few things that we otherwise don't have standard and I'm talking here about the SMZ nodes. Look up here in the ComfyUI Manager for SMZ node and then you will find the SMZ nodes. Let's take a look at the github page. In principle, there are only two nodes here. One is the prompting node and there is also a settings node. I didn't look at the settings node in detail, or rather I did, but I couldn't see the differences and here it is also mentioned somewhere that it is constantly changing. These are a bit of experimental features that we don't actually need at this point, but as we can see here, here we can already see the difference between Automatic 11.11 and ComfyUI, that other images can come out of it. It's okay to use ComfyUI, of course, we do that all the time, but if you come over from Automatic 11.11 or something and change, this might help you to transfer your images from Automatic to ComfyUI and to be able to work with it, or to have a little more familiar working environment at the point. In principle, today we only look at this Automatic 11.11 functionality here and I have to briefly explain what there is here. You can also find a link here. Where was it? Here. So, there is prompt editing. We are now here on an Automatic 11.11 page. We can use from, to and when. I'll show you that later. That's really cool, so you can really influence the image generation. But we also have, that is, that was described, we get a functionality here. Usage of BREAK and END keywords. And here, I'll try to explain that very briefly, because it doesn't work somehow. I tried that out and I thought to myself all the time, no, I can leave that out. In principle, this is the case that if we use the AND keyword, write everything in capital letters, then that should actually make sure that you put two terms together. So, if I were to write, for example, television and, capitalized, the brown couch, that he puts that together into one object and creates a brown couch with an integrated television. As I said, that doesn't work. That's why I'm not going to go into this video any further. You can play around with it. The node can supposedly, but somehow it is not ignored later or no idea what the sampler or the model does with it at the point. BREAK is a different story. With BREAK you can control a bit when clusters run out. Or it is like this, the clip, that is the functionality that ensures that what we enter, i.e. the prompt, is implemented in a StableDiffusion-comprehensible language and then used by the StableDiffusion model to create the image. Now it is the case that the clip converts the texts into so-called tokens. I can't tell you if a token is a letter, a syllable, a word. That's always a bit confusing in the AI area, which is a token now. And that also makes, for example, price calculations difficult for ChatGPT or something like that. Because there is always something like that with your payment for so and so many tokens and they always give such rough estimates of what a token can be. That can be a syllable with so and so many letters, but it can also be a word. It's just a bit difficult to grasp that at the point. In any case, these tokens are generated from the text and these tokens are packed into a so-called cluster. And with StableDiffusion it is the case that a cluster length can take up 75 tokens. That was once 77, that is now 75. That was reduced in order to continue to maintain compatibility with older models. However, if the prompt is now longer than these 75 tokens, then a break is made at the point and a second cluster with 75 tokens is opened. And this break keyword actually ensures that if you enter that in a prompt, a new cluster is generated at the point. So it is not written fully until it makes a break itself, but you say you want the break there. Then it is overwritten with empty spaces all the way to the end and a new cluster is created. There is a lot of discussion about these keywords. I didn't become smart about it. Or when testing, you notice that a different picture comes out or a little different, but not always. And it is debated on Reddit and so on whether this is useful or not. The technology does that in any case. It is claimed that it can perhaps help to separate things meaningfully, then pack them into clean and fresh clusters. But yes, try it out. I tried it with colors, for example pink hair, yellow shirt and green trousers and always put a break in between because I thought, okay, maybe it will help that he really has green trousers in his own cluster and yellow shirt in his own cluster and pink hair in his own cluster. But most of the time the same color mixture somehow came out as if you wouldn't use it. So try it out. I had no success with it. Much more interesting are in any case the prompt editing and the alternating words. But we'll get to that in a moment. So if you have installed them via the ComfyUI Manager or Git clone, please take a look at my ComfyUI Manager video on how to install custom notes. Then, of course, you have them available as always after a restart of ComfyUI. And what I don't think is so good is that they hide in the normal Conditioning folder. Here we have the ClipTextEncode++. That's this note and we have the settings for it in the Advanced folder. As I said, we don't go into the settings now. It is still written here what this setting should do. I think we are less interested now. You can attach that to the clip up here and then you take the settings with you. But the creator somehow wrote that this node can change permanently and would have to be replaced again. These are a bit of experimental features at the point. But this node is interesting for us. Up here, that's the standard node. Here we can bring in the clip and get the conditioning out. It's no different here, but here we have a few more settings. Namely, first of all, it is important to know that we can right-click on the node and we can then hide the options at this point. I'll just do that. We don't need the old, it's probably reverse compatibility. We are not interested now. Multi Conditioning is for positive prompts, the creator says, true and for negative prompts it should be false. In principle, as I understood it, the Multi Conditioning allows the use of these AND and BREAK keywords. We'll leave it at standard settings. Mean Normalization, I didn't notice a big difference. I think it also comes from the Automatic 11.11 area. Although he also wrote that the creators of Automatic 11.11 say you shouldn't turn it on, but it's still on somehow. We'll leave it at that. It would still be very important if you work with SDXL, you would have to switch the node off. That means you can turn on SDXL here or let it display and then set it back here from false to true and then you get the necessary SDXL settings, as well as the Clip G and the Clip L and everything else. But we are now limiting ourselves to, so wait a minute, I have to do this and that and then we can hide it again. We are now limiting ourselves to this functionality. I'll just load the node again. That will be our functionality and now we're going to do a little test with it, because there are definitely two cool things that I think are really, really good or that also create completely different possibilities of creativity. Let's take a loader. I'm already writing my standard terms in negative and now I would like to have two samplers. We put them on preview, let's leave it on Euler, let's leave everything as it is set and copy it down. So we can say here for comparison, we now want to have a little bigger preview nodes here. So what we can do here is we now say here, we first extend the prompting we say positive to input here and we say text to input here. Then we create a text node here and connect both with it. Then we now control both paths with the same prompt. Down here we still have to connect the pipe and here we still have to connect the positive because we want to see the differences. We can still say here we want to have the seed the same time. That would also be important. Double click on it, we get a primitive with the seed, say randomize at the point, connect it down here and done. Okay, what we can do now is we can take a look if we get the same picture twice. So the upper sampler or the left picture already has the clip text encode plus. It hangs up here in the optional positive in the wrong model. Of course, I also have to connect the clip, then I can also change the model directly. So here we have the clip text encode plus and here we have the normal ComfyUI and we should now get the same picture. Exactly, we have the same seed. We said to the text parser up here that he should do ComfyUI text parsing. That works. Let's switch it to automatic 11 11 now. I'll leave multi conditioning on, but as I said, that's for AND and BREAK. Then at this point we also get the same word, the same picture. If we now turn off the normalization and then have a look. Okay, they are still consistent. But now we come to the points that can be done otherwise. And that is the alternative wording and the prompt editing. And what we can do here in the prompt editing is we can say from, to and when. So the syntax is at the point. We say now here in the front, we take square brackets and now say from a woman to a man and from step 10. We have entered 20 steps here at the back. I'll put it out here again and then you can see it a little better. But that's the from to. And while the old ComfyUI parsing now gives a man and a woman, we have now created a picture that rendered a woman in the sampler with ten steps and with ten steps after that a man. You can see that quite well if you look at this sampler here. That from the middle the picture of a woman turns into a man. While the old parsing still produces a woman and a man. And that's pretty cool actually. I'll just say I want to have the steps stored. Steps to input, also here steps to input, double click and then we can do the steps here comfortably for both samplers. I go to 70 pictures here at the point and then we would actually have to be able to see quite cool here. At 70 pictures, of course, it makes little sense to change the whole thing from step 10. We can now of course say before 10, 50 percent. Now that would be 35 at 70 steps. But we can also say 0.5. That means from 50 percent sampling it should switch. And now let's take a look at what this preview does here. No, I'll just look down here again. Oh, now I've hung the seed in there. Oh, that didn't work out, of course. I hate right-handers, I hate right-handers. So, let's do it again. We can already see here that he has now started rendering a woman. Here you can see that. And there was the jump where he changed in the direction of the man. Now, of course, the dresses don't fit so well together, but I think the principle has come out quite well. Here he has actually rendered only one person. Zack, there was the jump. Here below he only renders one now. Maybe because of the steps, because they are longer now. We can just see if it makes a difference if we enter 35 here. Yes, that makes a difference, at least in the old prompting. Maybe the Comfy parser interprets that as weight. But here we can see quite well that it started as female and then switched to male. Okay, that's the from to when. We can also say with from to when. We want to render certain things from a certain moment. For example, if we now say we want to have the Milky Way, we take the syntax again and say at this point we want to get mountains. Namely upstep let's say 25. After that we would like to have an active volcano upstep 45. You have to look around a bit, but you can now see what he does with it. This time he renders the picture below. Let's take a look at that in comparison. There is the galaxy, there are the mountains. Let's see if he can do the volcano too. He didn't make it to the volcano right now. There the volcano comes into being a bit. While he of course renders everything at once here. But that's how you can control it quite well, actually. We can also say something like we would like, for example, landscape mountains lake and then say we want pencil sketch from scratch. And that should change, no wait, not from scratch, but we say pencil sketch should change to watercolor drawing upstep 40. We don't need that anymore. Down here we have a mix again from the beginning, that's how ComfyUI interprets it. While up here we now have the pencil sketch and then the colors are added a little. And here you can already see the difference that comes with it. You can also say he should add the colors a little earlier, let's say from step 30. Here everything is put together again, generated from the beginning. Up here we start with a pencil sketch, now he adds colors and we now get this mixture here. And that makes a difference how you get the pictures out of the place. You could also say we want to switch from photorealistic to ink drawing now. Down here we get everything back together again. Okay, maybe that was a bit too late, maybe we have to do the break earlier, let's say at 0.25 percent, so from a quarter of 70 steps. Yes, ink drawing doesn't come out that well, but I think that's a question of how the model is trained and how to design the prompt. By the way, there is a pretty cool prompt on the page that is linked here from Github. And here it is also described again what the whole thing does. I copy this prompt once and down here it says what will be rendered when it is rendered and here it says sampler has 100 steps. Let's do that. I'll do control V of the prompt in there and we say down here the sampler has 100 steps. And we'll take a look at the original picture or the Comfy picture in a moment. We are now waiting for this sampler to start here and see what happens on this way of sampling. Well, you can clearly see this jumping around, this switching between the samplings and what we have received at this point is once the quite normal version against a very fantasy-like version. I think it's really cool that there is the possibility that you can control it and tell the sampler when he should start rendering things or when he should completely tear things up that he is currently rendering. Okay, the next feature is the Alternating Words and that's really cool. Here we see the syntax, it is also swiped brackets, it is similar to how we know it. We do a and then swiped, uh, square brackets on the zu, sorry, and then say cow, bear, mouse, cat, dog, man, woman. And what he does with this syntax is with every step, that's a good point, by the way, I'll turn it down. Let's say we have 30 steps now. With every step that the sampler makes, it iterates through these terms here. That means step 1 is rendered with a cow, step 2 with a bear, step 3 with a mouse, then cat, dog, man, woman. And when he has arrived here at the end, he jumps back up here and iterates through these terms again. Oops. And that goes on until all the steps of the sampling are completed. And let's take a look at that. Because something like that will come out as a desert mixture at the end. We will compare to the lower picture in a moment. But we see how the sampler now, oh god, can someone please solve this, how the sampler now jumps back and forth between all these motifs and creates very strange things. Down here he also interpreted something interesting, which is completely over. But we can also say here, for example, woman with brown, blue, red hair and pigtails, afro look, dreadlocks. And now he jumps around like crazy in these steps. Down here we get a complete mixture again. And up here we see everyone absolutely crazy and the result is then different from what we get with normal prompting. So you can, for example, add a little flickering switching and different, so mixed terms, not with and connect or end brown hair, end blue hair, end red hair, but also say that the same term is simply switched at the point. And I find that very interesting that we have the possibility here. Yes, these are the things that you can do with this node. I think that's pretty good. Maybe one or the other will be a little better at prompting if he recognizes the style of the images that come out in the end better than automatic1111 does. Otherwise you can continue to use Comfy here, but you don't actually need that here either, because I think ComfyParser can't do that again here at the point. No, the wheel has gone through again. That's the standard parser, I don't think you have to switch it, but if you want to be in the automatic1111 area and want to use these additional features here, then you can just clamp them in between and have fun with it. Yes, I wish you a lot of fun trying it out. Rebuilding is not complicated and then just playing around a bit here with the terms and the sampler to be able to control it even more precisely and to be able to tell him, hey, you are now rendering a Christmas tree for me from step 10 or something like that and before that you render an empty bottle for me. Let's try it out again. You just have to say we take from 2 again, we want from empty bottle to Christmas tree from step, how many do we have, 30, let's say 12. Yes, we got a bit of a Christmas bottle here at the point. But down here we have, that's the difference, we got both. We now have the Christmas tree in the background and the bottle in the foreground and a Christmas tree in the bottle. It's just really a different way of controlling the sampler. Have fun trying it out, I wish you all the best. See you in the next video. Bye!
Info
Channel: A Latent Place
Views: 949
Rating: undefined out of 5
Keywords: ComfyUI, Stable Diffusion, AI, Artificial Intelligence, KI, Künstliche Intelligenz, Image Generation, Bildgenerierung, LoRA, Textual Inversion, Control Net, Upscaling, Custom Nodes, Tutorial, How to, Img2Img, Image to Image, Model to Model, Model2Model, Model Merge, A1111, Prompting
Id: KTVhh8t3EsE
Channel Id: undefined
Length: 28min 48sec (1728 seconds)
Published: Fri Oct 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.