ComfyUI - Prompt Engineering with CFG, Sampler Steps and Clip Skipping - for Stable Diffusion Users

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

we need to take another look at the prompting or the auto prompting and in this video we're going to be looking at how the images actually come to be we've created a lot of them but what's actually happening take a look at this guy here uh We've Got U 512 by uh 512 by I think 768 image we've got two bottles there's a there's a I asked for a fairy it's given a girl standing some lavender inside one of the B now how does it get to this how do we end up with these images where do they come from they look fantastic but where do they come from understand that when we work with stable diffusion every time we start uh a new image it begins with something like this it begins with a gray image and that gray image will have this kind of quality it will have some random noise some random differences is in how it looks that is the starting point for all the images and when we actually apply the prompts what we end up with is something that looks anything anything um we end up with something that looks nothing remotely like a gray slab with uh some noise on it like a slab of concrete or something like that we end up with amazing images now I want to discuss in in this video things that are not necessarily in to do with the with the prompts themselves but more the things that work with the prompts so I want to give the idea to you that you when you create an image you're like the the boss of a uh business that produces booklets or brochures and you running a team of employees who some of them do design work some of them uh prepare paints for printing some of them actually do printing some of them uh put together books after the printing work is done uh and then some of them do the kind of the proof reading to to ensure that uh all the stuff is looking good now you are in some ways the Maestro the company boss who is running the image processing inside of stable diffusion and when things go right you take the credit when things go wrong you've got to be the one that fixes it but how do you actually know when to fix or where to fix the problems when problems do occur uh and speaking of problems take a look at this image here look at that look at that there's a there's a lump of wood there which just lying around kind of randomly H so how do we fix problems when problems occur inside images now the first thing is there are three areas we need to look at where we have some degree of control and interaction with the artificial IAL intelligence that is creating this image now I'm going to very briefly talk about this is scary I'm not going to spend a lot of time here but uh what we're looking at here is a diagram that comes from the early days of latent diffusion or uh stable diffusion whichever whatever you want to call it so stable diffusion it's about a year old and this is something which discusses the AI that works inside of stable diffusion we've got latent space which is where a whole bunch of stuff is happening that involves artificial intelligence we've got what what they call conditioning here we've got the semantic map text representation images and pixel space normally when we work with Graphics in computers we're working with pixels we're kind of pushing pixels around inside of Photoshop and using all sorts of parameters to change the image but here we have a bunch of artificial intelligences working together to produce the images that you you you want and you've got some artificial intelligence that will understand what you're telling them the prompts some of them that Den noise the image that sort of gray concrete slab type of image that they start off with they denoise that they produce the image that we eventually see all of these artificial intelligences are trained to be among the best at their job but they are different they are different entities and they're not like a human team where everyone talks to each other and has meetings and tries to address problems and tries to achieve the same understanding the only way that these things can really work together is if you take command of them if you understand them take command and then give them the right instructions so three very important ways of interacting with the artificial intelligen is we've got the uh steps inside of the sampler we've got the CFG SC scale and we've also got what's known as clip skip now most most of the time when we work with prompts we're working inside of the positive and the negative prompts and this is what it looks like inside of the efficient loader and the efficient uh sample the uh efficient sampler um the positive and the negative are usually where we spend a lot of time but you need to really understand how the positive and the negative prompts are interacting with the CFG the steps and the clip Skip and that's what we're going to be talking about in this particular video now this image here shows uh something which is actually being described just by a bunch of negative prompts so I've said to the software I don't want text watermarks ugly uh disgusting disfigured or nude and then I've not put any positive prompt yet this particular uh this particular software using the dream shaper uh the dream shaper model has created an image which it thinks is the opposite of all these things if you if you eliminate from that noise from that random gray image you eliminate these things in 15 Steps you end up with this now the problem here is what is going on there's a tree coming out we've got this little piece of wood lying around but there's a tree coming out of the the Hut so um I've said don't do disfigured but this is definitely disfigured and uh what we want to do is to try to fix this image so if I wanted to fix this image there were couple of things I could do the first thing I could do is to start ranting and raving inside of the prompt saying I don't uh houses without trees growing out out of them or putting negative prompts or you know damaged houses or whatever let's let not do that let's be a little bit more precise and let's figure out the steps the CFG and the clip skip how they are going to help us now the steps are set at 15 the CFG is at five which is lowish it's not super low but the CFG is looking at the text here the the prompts and when we produce prompts inside of uh stable diffusion the software would turn those prompts into what are known as tokens and it's really those tokens that it's looking at to try to develop the image now the tokens are it's it's difficult to explain the tokens and how they actually operate because it ends up being very technical but the tokens are a way that the software has of understanding the language and by understanding the language is trying to understand your intentions so the prompts are very important but in this situation ation we don't have a prompt so how do we fix this without actually doing more prompt work well we could go to the steps and change the steps value the steps are doing what sometimes has been described as looking at the negative prompts and taking that gray image and trying to remove anything that is that negative it's trying to take the positive prompt and trying to make sure that that great image evolves into something that resembles the positive prompts and the number of times it does it is determined by the steps value so that's a simple way of understanding the CFG scale is how much the the the sampler should look at the prompts the higher the CFG scale it goes up to 100 the more the sampler will look at the prompts and in this situation is only looking at the negative prompts so what I did knowing that the uh this particular model is a good model and it doesn't produce weird looking images most of the time I decided to tackle this by increasing the number of steps and when we doubled the number of steps to 30 what we ended up with was this we've got everything that was wrong basically fixed the weird looking pieces of wood here uh the out of place pieces of wood have turned into beautiful bushes um there was this thing here the roof here was looking a little bit kind of U awkward it's oops it's turned into a much neater looking structure uh and certainly the tree that was growing out of the the hut that that's that's also been fixed that's that's pretty good and we achieved that just by increasing the number of steps so when we increase the number of steps we end up with more of what we actually want based on on the ability of the sampler to continually look at the model and ask the model how do I produce what I've been asked to produce you're asking the uh the the software to use all of his IQ to create the image that you asked for and the more steps that it takes the more intelligent the way the the software behaves now um the CFG remains fairly low and we increasing the CFG wouldn't probably have helped very much because uh for the reasons I explained we're not really saying what we want to see um if I wanted to see a beautiful house in the woods and i' stated that then maybe increasing the CFG would pay cause the software to pay more attention to what was happening in the prompt and maybe we would get a better response here without necessarily needing to increase the steps whenever you increase the steps we increase the steps from 15 to 30 we double the amount of time that everything takes now this is a NE the next image we're looking at and all I did there was literally just changed the seed so I changed the noise that the image was looking at in order to produce the image so initially it produces some houses in the woods and they look kind of okay and then it produces is this beautiful looking vehicle this really robust looking vehicle the one kind of vehicle that you would find in a very snowy District it's amazing how it does that without any real prompting from me I've just told it what I don't want to see it's looked at that gray noise and it's produced something that looks pretty awesome and you could really begin to ask the question who's doing the genius here is it me or is it is it the software who is doing the genius work I think this is a fantastic looking image if we go a little bit further and we reduce the steps to 15 we end up with what looks like a car in the snow so it's definitely decided hey there's going to be a car involved in this and we increase the steps to 20 and we begin to see something that begins to resemble that vehicle we saw at 30 steps the one that's designed for moving around in the snow now all the the only thing we were changing was the number of steps that was it uh we changed the seed and then we changed the number of steps what I've done here is to use keep everything the same but reduce the CFG and these steps the steps have gone down to one and the CFG has gone has gone down to M to 0.5 this means that we're barely paying attention to what's happening here and the number of steps just one in other words we've barely moved on from that original gray noisy image I've added a couple of new words yellow and blue and just by adding a couple of new words we've got yellow and blue happening here I've also if we take a look at the previous image very low steps very low CFG I've increased the number of steps and increase the number of the the CFG we've gone back to five and we've gone back to 15 Steps here what's happened is that I've kept everything the same but just changed the uh the the uh the model now when you are not getting the results that you want sometimes it makes sense to change the model rather than changing all everything else just changing the model keep everything the same the seed that that could sometimes produce exactly what you're looking for and that tends to produce the most amount of change when you have a lot of prompts in here we've just got yellow and blue and we've changed the model from dream shaper to colorful and it keeps roughly speaking the same sort of design in other words What's Happening Here is most of what's happening is coming from the prompts and also the noise the noise is having a huge impact what we actually get so we've gone back to dream shaper here and uh what we're going to do here is add another word um to to The Prompt the word is beautiful someone looking at this here would conclude that I'd added the word flower they wouldn't say he if they were to try to guess what words I'd use to describe this they would say um he used the word flower uh blue uh and yellow flower he used um sky sky and clouds they wouldn't be able to guess the the words that I'd actually used to describe this they wouldn't be able to guess text Watermark ugly disgusting disfigured nude as the negative prompts that there's an ability for the software to produce images that can't easily be traced back to a specific combination of prompts and that ability where you get an output that can't readily be replicated by guessing the prompts that ability is one of the pro probably uh most useful abilities when you are working professionally with a software and you don't necessarily want to give away your secrets so we saw earlier there was a a a a workflow saving option which I think I've still got here we can export images uh without the embedded workflow so we can do this using this uh plugin here this extension that's one thing you might want to do if you want to keep your uh prompt secret I think there are actually good mathematical reasons why we can't completely reproduce the the prompts from just looking at the image now I want to move on to a different image going to go down a couple of steps and we're going to look at this we this um this Lighthouse I've changed the prompt to Lighthouse we're looking at dream sh dream shaper uh the the mo the the model the main model and we're looking at 30 steps and six CFG by my thinking by my reasoning this image here looks pretty nice for a 30 step image with with a CFG of six the colors the uh the detail it looks really nice it's following my my instructions don't have text Watermark ugly all that stuff just have a lighthouse and it's really kind of remarkable how how the software can produce an a really kind of um holistically sensible image just from a single word it figures out Lighthouse and it also figures out a whole bunch of stuff that go with Lighthouse the colors keep an eye on the colors we're going to be looking at the colors we're going to be looking at the contrast here the colors and the contrast are good we've got nice healthy red white we've got some contrast we've got a sort of gray cloudy sea let's move on to the next image here we've kept quite a lot of the parameters the same so we started off here and we moved on to this image and all that really changed was just the the model so changing the model just changed the design significantly this guy here for whatever reason looks a little bit more a little bit less cloudy a little bit more um contrasty uh in terms of color there's more color contrast there what I've done here is reduce the CFG so I'm keeping everything else the same but reducing the CFG so if we do that notice that everything remains relatively good nothing bizarre happens just by reducing the CFG to 3.5 and now what we've done is reduce the steps we've taken the steps from where they were from 30 down to 1 and when we take the steps down from 30 30 to one we lose the structure we lose the meaning inside the image it's almost as though the steps are essential for bringing out meaning inside these images the color interestingly the color is already there even at step one the color is already there we've got the blue from the sky we've got something that begins to look like a white house that's interesting one step we've got something that begins to resemble the the final scene let's go to 251 here what we've done is increase the steps to two it's still looking kind of like we don't know exactly what's there but we've got lots and lots of saturation and lots of contrast way more saturation and contrast than we started off with and the main difference here is not the change in the number of steps the main difference is the change in the CFG we've gone from about 3 and a half to 12 and a/4 and the image has changed drastically just from changing the CFG more contrast more color what we've done here is increased the steps from I think 2 to five wasn't it now that increase in steps from 2 to 5 produces a meaningful image but we still have this crunchy detail we still got these amazing colors although they're not amazing they're kind of uh too too much and that happen with just a a steps level of five when we increase the steps level to 26 things improve the colors improve but it's still crunchy it's still there's too much going on there's still too much saturation there's too much contrast it looks as though the CFG number is what's giving us that saturation that contrast every time that we've seen that saturation and contrast the CFG has played a role in in producing that let's go down a bit further what I've done here is to reduce the CFG and we see that the the contrast the image I think now looks a lot better it loses some of its ugliness when we increase the CFG to 32 it just becomes horrible we get lots and lots of contrast we get lots of horrible colors that this all sorts of weird detail going on here reducing the CFG reducing the steps we get something that almost resembles that completely gray image that we start off with now what I'm trying to demonstrate here is how important the steps and the CFG is in producing the image remember the prompts are not changing and in the last few images last 5 10 images we've not changed the the uh seed or the the model all that we're changing is the steps and the and the CFG increasing the CFG creates even more contrast we've got 18 now and the contrast is almost striking it's it's almost like uh the thing that really stands out we've got dark colors here we've got uh very bright Skies very very blue skies very saturated Skies and we've got this contrast even in the sky now what's going on there is that as we change the CFG the main thing to to to Really take away is the CFG seems to be involved in very high levels of contrast low CFG very low contrast High CFG high contrast and then we increase the steps that seems to somehow improve the situation so increasing the steps even with high cfgs seems to produce a better outcome so that's something that that that's something of a takeaway there so even when the CFG level is very high increasing the steps level might actually rescue you let's go ahead and go to the next one which is a very low CFG we actually kept everything else the same absolutely everything else Remains the Same and you notice there we have black and white now the image I think actually looks kind of kind of beautiful so very low CFG um it looks like a pencil sketch it looks black and white I kind of like it I really like it I think it looks awesome but it is black and white so high CFG high contrast low CFG low contrast low color but the low color is in this I can't see any color here at all so it hasn't moved away from that um total grayness except we've got steps level of 28 so the thing that seems to work to really impact on the saturation on how much color there is is the CFG no matter the fact that we've got a high steps level moderately High steps level we still keep the the gray so the the CFG and color are intere intertwined in in a way that is a little bit surprising but something that you really need to take take notice off and then uh low CFG low steps we end up with that sort of uh mid gray color so hopefully at this stage you've kind of understood what I'm trying to get at that the CFG has a big impact on color and contrast we now have a a a moderate fairly low CFG and fairly low number of steps we're still using the same model and we've got these nice muted colors we've got bright red yeah yes sure but we've got fairly muted blues and reasonable contrast but not excessive contrast we change the the model and things remain roughly the same the composition remains almost identical and that's sometimes a really cool feature when you're using just a few words just a a limited number of prompts so in my experience when you increase the number of of prompts you tend not to get uh equal results when you're moving between different um different models so all we did there was changed the model and it remained the same what what I did there now is if you take a look at the the seed we go from 24 to 25 and the result is we get a complete change so changing the noise can radically change even one number can radically change the the the results that gets rendered uh and generally speaking that's the same whether you're dealing with um with one version of stable diffusion or another version of stable diffusion so we move on and all that that we're changing now is the is the model so we're moving to from dream shaper to skin rate skin rate is one of my models it's one that I created and um I I I created it basically because I really like two two different models but those models would sometimes give me weird results and when I merged them into a single model they actually worked a lot better and skin rate was one of those merged models there were there was another one as well notice that as we move from 9 to 31 um that's nine uh we get a radical change in the landscape now what the steps are doing remember is taking a look at the prompts and taking a look at the CFG as well they work with the CFG but they're taking a look at the prompts and they're saying okay how do we produce what this guy asked for given that that we have this model as the as as the place to get our inspiration from so 31 steps nine steps there's a huge change between those two and then we go up to 55 steps there's a significant change also from uh 31 to 55 but uh the change from 9 to um we'll go from 9 to 31 I think you'll agree is a lot more than the change from 31 to 55 and that's something that I generally find to be the case it's it's the changes that happen in the low numbers of steps that has the biggest impact on the on the final image that you see so from 55 uh to 31 31 to 55 we don't get a huge amount of change we get the windows moving around and we get this little thing shifting a little bit uh on on on on the on the house there but the lighthouse itself remains roughly the same shape and uh we don't get a huge amount of change so when you go above say about 30 with a stable diffusion one things don't change that much and that's generally speaking being the case in certain situations where you have a lot of text here things can continue to change quite a lot even above 50 60 70 you you you see very significant changes as you go up to 100 120 we go down to 22 everything else Remains the Same the image is still very usable even at 22 and then we change from we keep the steps at 22 we increase the CFG to 10 so we're going from about 4 to 10 and the results of going from 4 to 10 just that just that one change produces a very different uh looking result somehow by getting the C G to pay more attention to our to our text to our prompts we get a really fairly different looking image from the one that we had before notice also the colors and the contrast the colors have become slightly more saturated and we can zoom in a little bit you can see here that we have got more contrast throughout the image more color and that's something which usually I find I don't like I usually don't like this sort of response I this kind of of result but it's a low CFG it's just 10 CFG of 25 we're beginning to get cartoons we're beginning to look look at what looks like a cartoon it looks ridiculous and uh then when we increase the cfu to 50 remember it goes up all the way to 100 at 50 it looks like a child designed this like it's what I'll call naive it is so dark that he had to put a sun there to kind of give the impression that uh it tries to make things logical but it's produced such Deep Shadows it's it's actually produced a sunset to justify those deep Deep Shadows and I got to tell you even at Sunset I don't think we see a sea this this color blue so at 50 the saturation the contrast it's it's off the scale it's really off the scale so that we did change the the steps a little bit but most of the change there comes from the CFG right and then we have this here where we've got what looks like uh we've got it's almost cartoonish in in nature but it looks good we've got the lighthouse there we've got another Lighthouse uh we've got uh what looks like a rocket taking off here it looks like a rocket doesn't it but it's actually a lighthouse and and it's in the sky but it looks like one of those spaceships where they're kind of billowing all of this kind of fog and small and and and cloudy gas that they give off as they launch now why do we have this here we've got this here CU I put Lighthouse surreal which is a form of Art and rocket ship so I wanted a surrealistic image where we've got a lighthouse that could perhaps be taking off like a rocket somehow it figured out that this is what I wanted and it gave me this image now this one is not a huge difference to what we had before I've added a couple of words I've uh given it a sensible CFG of uh 8 8.75 and uh we've given it steps of 155 that change that change to the number of steps was what allowed it to produce this really amazing image where we've got something that looks pretty artistic exact exactly what I was asking for something surreal it's given us something that isn't a rocket ship it's a it's a lighthouse taking off uh we could title this I don't know uh the USS spaceship uh Lighthouse takes off on the horizon at uh at in early morning now the reality there is that the software was able to create something that really met all of my expectations but we got there with a very high steps number now I've tended to emphasize the the fact that I like low steps number because that when you have a low number of steps the thing processes quickly you can get more uh you can test more parameters you can test more prompts you can test different models when you finish quickly with the number of steps it gives you degrees of freedom but if you want your final image sometimes I do recommend increasing the number of steps to a very large number particularly when you're dealing with Concepts and ideas that are fairly complicated and that sometimes will actually give you a better result uh than you can get at about 30 or 40 and with some models you can go very high and some prompts you can go very high and still get significant improvements in the image that gets rendered now the other thing I promised to talk about was the clip skip now clip skip operates on the part of the neural network it actually operates on once known as the clip Network and the clip network is a network that analyzes the the prompts it it's involved in taking the the prompts taking those making those into the tokens that uh the software actually works with when producing your image now the clip skip is it's a way of reducing the number of steps that the software goes through when it's trying to understand what you told it and some people will request or suggest that when you're working with a particular model for instance um let's say we're working with dream shaper let's say dream shaper suggested with dream shaper you want to work with a clip skip of minus two well that would would mean is that with this particular model if that was the recommendation the the author of this model would not want you to process the speech part the The Prompt all the way the low steps number gave us an surreal image something that is so bizarre but which sometimes I didn't put surreal in in in in in in in the in the in the box there but it gave us a surreal looking image and if you think about about it sometimes what you actually want is not something that's perfect you want something that is broken something that is weird something that maybe is disfigured so sometimes when you are working you may want to take some of these negative prompts that I almost always use and transfer them to the positive because for your type of workflow these terms might actually be preferable as positives sometimes low steps numbers may be positive because things don't look good at low steps number they look kind of broken so that's something to consider it it's definitely the case that you tend to find most guides tell you to increase the steps number to get a better result think about reducing the steps number to get a better result think about reducing the CFG number to get a more gray looking outcome a more desaturated looking outcome do you remember that image with the with with the uh W with this vehicle this amazing looking vehicle see how s desaturated it is we've got a CFG of five the CFG was higher would begin to get weird and and unhappy colors that uh that would appear inside of the image as far as that issue as far as that issue with the CFG scale producing nasty looking results at at a fairly low number well it's something I think we actually saw a little bit of earlier on and we saw it with this particular uh situation not that one that one so we did the uh we did the comparison earlier on using the XY plot and do you remember how we got results for the CFG scale of 16 these results are actually more saturated than this guy here the CFG of 8 to my mind looks better and it looks better here and when we combine a high CFG and the DPM 2A The tpm2 ancestral uh sampler we end up with a cfg1 16 uh result that is pretty much unusable these guys I I could consider there could be potentially some use cases for these two but in most cases the E level actually looks better so my recommendation is when you are working with CFG and you're developing your own workflow one of the things you want to do is to make use of the XYZ charts these will tell you at what point your particular designs are beginning to fall apart with the CFG number raise the CFG number if you need to use these charts to figure out how far you can go with your workflow if you use dpm2 quite a lot you may not be able to go very high uh if you use uh Oiler uh and uh uh here we've got Oiler and Oiler you may be able to go a little bit higher with the CFG scale but be careful even 16 can produce problems in terms of contrast and color now these problems with contrast and color how can we take care of them well keep the CFG scale number low my recommendation is when you need to increase the CFG scale beyond what you can bear in terms of the color saturation and contrast instead of increasing it more just go ahead and add more information inside of the prompt doing that will allow the software the clip Network it will allow the software to understand more about what you're trying to say so I don't know if this is going to work with your workflow but it's something I've tried I've tried to put more terms and this is what I was trying to suggest in the previous video where we were talking about the Persian princess the uh Egyptian princess lots and lots of terms sometimes you need to do that in order for the software to understand what you trying what you're trying to achieve um the use of prompts increasing the number of prompts changing the prompts so that the meaning is more precise about what you want changing the the prompts so that you include prompts that are not exactly what you want but which are related to what you want sometimes doing that is better than increasing the CFG scale that's one way of tackling the problem with the CFG scale so hopefully at this stage I I hope I've persuaded you that there is an issue with the CFG scale when you go above the the the teens the mid teens um and hopefully this has been some useful information for you because I think some of these issues they're not just found with stable diffusion 1.5 you you find them with other versions of stable diffusion and they really to do with how the software actually interacts it's is to do with uh deep issues within the the software design now one of the other things that might become a possibility for you it depends on how much time you want to do this but I if you open up the manager and we go into custom nodes there's actually a couple of very interesting uh extensions these I think come from I think these come from stability AI but uh there are two uh extensions here one is called comy UI experiments sampler toner samp sampler tone map and the other one is uh experiments sampler rescale CFG and these two are really designed to try to tackle some of the problems that we are looking at uh in terms of the way the CFG produces too much saturation and too much contrast so hopefully I've demonstrated that it is a problem but hopefully also you now understand that other people have found issues with the CFG uh there's actually a very interesting Link in one of those uh one of those custom nodes to a paper that was uh produced by uh of all uh groups the the guys over at bite dance and it's a very recent paper that describes some of the problems with the CFG scale it's actually called common diffusion noise set schedules and Sample steps are flawed and it actually suggest that there is a the the way that we use CFG might actually be broken and they they make a case for that and um they make a case for uh what they think might be a solution and that particular argument has been taken into account in one of the uh one of the new extensions now the these are experiments they're experimental but uh you're free to install them and to experiment with them when I was using them I I actually kind of liked them I actually thought that they did a lot of good so hopefully we may um see some improvements coming along sometime uh in terms of how the CFG works but otherwise try to keep the CFG safely low experiment with steps that are low experiment with steps that are high and be aware when you bring in workflows from other people from other users sometimes you'll find the clip skip has been sent to has been set to minus numbers that are a little bit lower than perhaps is ideal for your workflow so you may want to uh change that and finally make lots and lots of use of XY plots to understand what impact the steps are having and also to try try to avoid too much damage from the CFG uh from the use of CFG numbers that are too high understand also where the CFG numbers begin to introduce maybe lack of saturation and lack of contrast that might damage the outcome of your of your prompts and your designs

Info

Channel: Pixovert

Views: 6,876

Rating: undefined out of 5

Keywords: classifier free guidance, comfyui, stable diffusion, professional prompt engineering, prompt engineering like a pro, clip skip in comfyui, sampler steps, improved images stable diffusion, cfg stable diffusion, cfg contrast, cfg color

Id: MirJlhFMdB8

Channel Id: undefined

Length: 42min 31sec (2551 seconds)

Published: Mon Sep 11 2023