At some point, once using Stable Diffusion and
Fooocus, you'll probably find yourself wanting to get a specific pose, or a specific
pose mixed with things like design, or even swap faces. If you want to figure out
how to get more consistent characters with Fooocus and Stable Diffusion, the many features that I'm
going to be covering in this video will be very useful for that. In this video, we're going to be
diving into the image prompt feature of Fooocus. If you've used other tools for Stable Diffusion
or Midjourney, then this chart gives you an idea of the differences between Fooocus and the others.
With Fooocus, you can mix image and text prompts reliably and perform better in most instances than
other interfaces for Stable Diffusion. I cannot confirm if this is true or not in my testing
yet, but it is very reliable, I've found overall. Now, this video assumes you've already have
Fooocus installed, and you know the basic usage. If not, I do have other videos that
cover that. I'm going to be doing this with using the regular run.bat file and the default
settings and styles unless I say otherwise. So, at this point, Fooocus is up and running.
I have turned on the advanced tab, I've changed over to just quality, and the 1024x1024 aspect
ratio just to make it more consistent with most of the images that I'll be generating
here. Now, to use the image prompt feature, we're going to want to check off the box
down below here that says 'Image Prompt'. I'm going to show you the basic image prompt
first, then we'll move on to the advanced stuff. Immediately, you're probably going
to use the advanced for the most part. Now, in this, I'm going to use a specific image. You
can drag the images from up above for these as well. I'm going to use this woman in a red
dress, dancing in a field. The idea of the image prompt is this is going to influence the
generated image. It won't look exactly the same, but it should carry over things like the color
she's wearing, a dress, a similar looking woman in the field - those sorts of things are going to
pretty much carry over from most of the images you generate. Okay, as you can see, we have two
images generated. In both of them, the red dress and similar-looking woman are present,
even the dress itself is very similar. The poses are slightly different, and the
background is different, but it's still the same idea for the picture. So, you can
see how that would influence the generation. Now, I'm going to demonstrate that
this isn't always going to be the case, and this is where the advanced features will
come in. I'm going to drop this as a Tardis from Doctor Who and go ahead and generate. Again,
not putting any text in the prompt – you can, and I'll cover that afterwards – but
just for this purpose, I'm going to demonstrate how this doesn't influence
the image as you might expect it would. Okay, as you can see in this image,
it did not influence it too much. It did influence the color a little, and
I don't know if anything else. Now, I'm not going to dive into this a lot because the
advanced features really make this stuff there; it doesn't really matter. But I
am going to show you one more. Now, you can mix and match multiple pictures
to try to influence, but you're going to get unreliable results. Let's just put it that
way. I'll show you one more example. Now, for here, I'm actually going to add a text prompt, but I've dropped the picture of the dog down
here just to show what this will do. What it should do is generate a picture of a
cat with similar colors as this dog. As you can see, one of these was more influenced
than the other, but still, you can see the sun in the background is influenced on this image and
this image. It won't always influence everything, so that's the reason why just the basic image
prompt isn't necessarily as useful. For myself, at least, I don't ever really use it because,
with the advanced controls which I'm going to show you next, there's really not much reason
to use this. But I did want to cover it. Okay, at this point, we're going to
move ahead to the advanced features because I honestly think that's what
you're going to be using. You're going to find that you really have no interest in
the basic. So, we're going to go down here at the bottom and check off 'Advanced'.
Now, you will see more options come up. You're going to have these two sliders and
these options down here, which I will cover. Now, the first option was the image prompt,
which we were already using. Then you have 'Stop at' and the 'Weight' slider. Think of
those as well, think of the 'Stop at' as all about timing. It determines when the influence on
your image prompt will stop during the generation process. When it gets to 0.5 (50%), which is
what it's set at here, the prompt's influence at that point ends, and Stable Diffusion
takes over and generates from there. So, the later you stop that, the more influence
it's going to have on your image generation. Now let's talk about the weight. The setting
is like a volume control for your image prompt. The higher the weight means that your prompt
will have a stronger impact on the generated image. It's like turning up the dial on how
much you want your prompt shapes the style, composition, or other aspects of your
final image. Usually, I start with the default and start playing around with them
to get what I want. A lot of this is going to be trial and error when
it comes to those things, but always start with the
defaults and go from there. So, on here, I'm actually going to do
another image prompt, but I'm going to change a couple of things here. We're going to
actually increase the weight, and I'm also going to increase the 'Stop at' to a higher amount.
Let's see what happens when we do that compared to these original two images. And I'm going to put
'cat' in here as well, just like we did before. Okay, so the image generation is
done. As you can see in this one, it had much more influence on the image,
almost having a hard time even creating it as a cat. It did switch over at the end and
was able to force that. So keep that in mind; if we had stopped it at a much higher amount,
it would have probably had a much harder time creating that cat. At that point, the weight
was more with the color and everything else with the background image, and it stuck
with a much similar picture. That's how your 'Stop at' and the weight works, and it
works the same for pretty much all of these. Now, the next two I'm going to cover
are PyraCanny and CPDS. I'll cover them somewhat together because they do similar
things. These are used for bringing over the structure of an image. Maybe you want
a certain pose or you want a house to look a certain way. You can bring in an image and,
unlike with the image prompt which we brought before - a perfect example was the woman
dancing - that would influence the image, but the pose itself wouldn't be influenced.
This is where PyraCanny and CPDS can come in. PyraCanny breaks it down to the outlines. Think of
it more like a coloring book. If you look at this image on the left, you have the original image
of the house, and on the right, you have the outlines. That's similar to what PyraCanny does.
The influence of how much detail is brought over when doing that is determined by the weight and
the 'Stop at.' Those do have influences on that. So, if you want to increase or lower the weight,
you're going to get more or less detail. The same idea with the 'Stop at' is how long that's going
to influence the original image. Once at 0.5, once it gets halfway through, then it stops using
that original image and generates from there. Now, CPDS works for a similar purpose, but the
difference with that is, instead of doing the outlines, it decolorizes the image. As you can
see on this page, these are the original images and the resulting images over here. This can be
better for certain images, whereas PyraCanny can be better for other images. They both have their
advantages and disadvantages, and a lot of it will come down to trial and error on which one
works better for which. PyraCanny is great for capturing and reproducing finer details and clear
lines from an image. For instance, if you have a picture of a bunny and you want to keep the exact
proportions in a new style, PyraCanny would be the tool to use. CPDS would be more suitable when you
want to transfer complex scenes or poses from one image to another, maintaining the general shape
and depth but not necessarily the fine details. So, experiment with the two. Usually, if
I don't get the results I want from one, I'll try the other one to see if that method
works better for what I'm trying to accomplish. Okay, so for these examples, I'm going to try
a few different things just to give you a basic idea of all the different things you can do
to mix and match all these things. Because you can mix and match them, you don't really
want to mix and match PyraCanny and CPDS, from my understanding, and I don't
usually. You just want one of those in one of these image prompts, but
then you could have an image prompt, you could have PyraCanny, a face swap which
we'll cover next. You can mix and match these. So for this one, I'm actually
just going to show you, let's go with this house and PyraCanny,
and I'm not going to touch anything else. I could generate just with
that, but that's not really what you're going to be doing. You're
going to be mixing it with text prompts and other prompts. So, that's
what I'm going to show you here. Let's say we have a house. I want a house that
looks similar to this, but let's say I want it in a lush forest. So, we're going to put a house
in a lush forest and we're going to go ahead and generate a couple of images. Actually, I
want to just change this over to speed, just to make this a little bit quicker for
myself. We're going to generate some images. For this one, as I said, we have PyraCanny.
We set it to the default settings, which has a weight of one, and
then the 'Stop at' 0.5 or halfway. Now, depending on what you set these
at, and I'm going to show you this, I'll leave these going for right now. I'm
going to show you what you get for a result, and then I'll show you how you
can adjust the settings or how the settings being adjusted will impact
the results that you get from this. So now, just like Stable Diffusion itself,
you're always going to have varying results. And so we see the first one, it actually
came in with a pretty good forest around it, the other one, not so much. We can actually
change how that might be impacted. Now, the problem is, if you look at it, the house
actually looks pretty similar to the actual, you know, same structure idea. As you can see,
the chimney, now this one added a chimney here. So just like regular Stable Diffusion, it's
going to add things, so just keep that in mind. I'm also keeping it a very simple
prompt, so that would be another factor. So let's go in here, and I'm actually going to
show you a difference. We're going to stop this at 0.2, just to give you an idea. We're going
to use the same weight but we're going to just stop at 0.2. I'm going to give you a general idea
of how that can change the picture. So now we see where it stopped using the influence, and
it's generating from there. As you can see, it's definitely not really that close. I mean, it
keeps the house in the middle, things like that, but otherwise, it's going to lower that influence.
Because, as you can see here, it starts off, and you're like, 'Oh wow, that's actually looking like
the house.' But you're going to be able to see, almost in an instant, right there, it changes,
and you lose that influence in the prompt. So, the further along that you are, the more it's going to look like
the original picture. The problem is, if you're trying to make
something too radically different, it's going to have a harder time creating that. So now, let's crank that way up. We're going to
actually crank that to almost all the way to one, and we're going to generate. Now, we'll see
the difference on this. Notice pretty quickly that it's going to keep even the clouds, as you
can see here. These clouds also line up in here, so it's going to keep pretty close to that
picture up until the very end, and then may add some variation. But usually, towards the end, is
when it only adds the minor details, so you're not going to have anything drastically change on the
image when you set the 'Stop at' that much higher. Now, you can mix these with styles. That's
another thing to keep in mind. These things can all be mixed with styles as well.
So now, as we can see with this one, it actually did a pretty decent job,
even with the influence set, you know, the 'Stop at' set very high. But it still stuck
with even that bush, that small tree is back here. So, you get an idea. Play with
these, get an idea of what they do, and we'll stop this one. Let's go
with the 0.5, and I'll just show you the difference. We're going to put the
weight down here, and we'll generate again. As you can see in this image, the weight is much
less, and you can see the house is different, making it much lusher for us. So, you can play
with these settings to get an idea once you know what you want. Okay, so we're going to actually
use this one here of this woman dancing in the field. I picked this one because it wasn't a
standard pose; that would be a very hard pose to get consistently with different clothing
styles or different people. It would be very hard to describe in the prompt, so this is where
this sort of thing comes in handy. We're going to put on the CPDS for this one. Like I said,
try different things; they're all different. The other thing to keep in mind is the
fewer other details in the picture, like in this one, we're trying to go with,
let's say, if we mostly wanted the pose, we would probably want a picture even better
without any trees in the background or anything if you could get. Now, the other thing is,
let's say you don't want the person wearing a dress. That's the reason I didn't use the
dress one because if you use the dress one, that outline of the dress is actually going
to impact the prompt. So, you're better off using one that doesn't have, you know, it keeps
a general body shape, as I'll demonstrate here. Now, let's say a warrior dancing in the
forest. So here we go with this one, and we're going to see what that does. So, like
I said, we only put the CPDS on, and that's the one that uses the decolorization.
We're going to have a stop at 50, and we have the weight set on one, which are
the default settings. Now, as we can see here, it's generating. We have that same pose,
you know, now it will vary slightly, but I find in most cases, if you have a good
source image and everything, the pose will be there almost every time, pretty closely. The hands
might be slightly different, things like that. So, if you really want a picture, preferably
that doesn't have anything in the background, so in these two, as you can see,
it's pretty exact to the original, and it's also in pretty much the
same spot and everything else. So now, let's say we had the pose, but let's
say we also want to have a certain style. What we could do there is, let's say we wanted, we
could describe it, but it might be easier to actually just drop another image. So, in this one,
I'm going to actually drop this image in here, and we're going to use this as the image
prompt. So remember, that influences the look; this one we're going to leave for the pose.
Now that we have those set, I'm actually going to remove the prompt, the text prompt because I
want to show what the image prompt does here. So, as I said, this over here is going to be more the
structure of the image; this over here is going to be more the style, the color, and all that. Now,
I am going to increase the weight on this one, just because I don't have any other text
or anything like that, so I'm actually just going to increase that a bit little bit
and I'm going to go ahead and hit generate. Okay, so as you can see, we have a similar
style as the one on the right, and the pose and structure of the image from the image on the left,
which is using the CPDS. Now, we could potentially use PyraCanny in this. As I said, they're both
doing similar things; it's just how they do it differently. One thing we'll notice here is that
I don't have anything else in the text prompt, so these are the only two things being influenced.
But let's say I wanted to influence that, and I'm going to say 'a warrior with a sword.' I'm
not specifying 'dancing' or anything like that; it should carry over the structure and everything, so
I shouldn't even have to worry about adding that. As we can see in these images, it's added
a sword. Now, it keeps the same pose, and the style and everything
is taken from the one on the right. I could generate 50 of these. I
tested them earlier, and in all honesty, I generated a whole bunch of them. These are
just some of the images. So, as you can see, it's pretty consistent on the results that you
can get. If you're not getting the right results, consider that your image source may not be good
enough, or you may have too much background clutter. Even adding an image with a watermark
on it may get some weird effects from that. These are the things to keep in mind when you're
doing this. Okay, now for the last thing I'm going to show, I'm going back to just show the face
swap down here in the image prompt. You can mix this with the other things I've already shown
you. You can mix and match those how you want with text prompts, styles, and everything, but for
this, I just want to demonstrate how this works. For this one, I'm going to start off with
this image here, and I'm just going to put in 'a woman with black hair.' I didn't
get any more detailed than that. Now, the idea behind this is that the face should look
pretty close. We haven't cranked the weight up, and the 'Stop at' is already pretty high
with these. The idea is that it's going to remember more of the structure of the face.
I always start off with the default and then increase it if I find it necessary to get
what I'm looking for. Always start with the default because those are usually set
at what they found were the best settings. Looking at those, I would say they're actually,
yeah, they're pretty close. They're not always going to be exact, and like I said, you can
increase the weight. But I'm also going to show this, and we're going to actually just change the
face in here, and we haven't changed the prompt at all. As you can see, this definitely influenced
it. Now, it obviously, like I said, it's not going to always get them exactly the same, and that's
something you can use multiple if you have two, three, four different face angles. So you could
add another one in there with a different angle that will improve the results that you get. Okay,
so I think that pretty much covers everything in the image prompts, as far as the basics and
the advanced, but not the extreme advanced. There's a lot more we could get into, mixing
and matching, and I'll probably cover some of those things in later videos when it comes
to consistent characters and things like that. The biggest thing when it comes to any of these is
to experiment, and if you don't get good results, make sure you're keeping your prompts simple. You
know, try different images. Low-quality images, I find, don't always work well. So if you're trying
to use a pose with a low-quality image and there's a lot of pixelation or anything like that, that
could really negatively impact the results that you get. I hope that you'll learn something from
this. My next video actually is probably going to be on the in-painting and out-painting, hopefully,
I'll have that probably within the next four to six days. That's going to take a little bit longer
to produce, like this one did. And then I do hope to have some even more advanced tutorial, at
a later date, with like I said, consistent characters and things like that. So thanks for
watching and have a good day. See you next time.