Welcome to another Stable Diffusion tutorial.
Today we'll be looking at ways to integrate Stable Diffusion into Photoshop. For this tutorial we'll
be using comfyUI and Stable Diffusion. If you don't know how to use comfyUI or haven't seen the
first video about how to install it and run your first generations, please do so before attempting
to do this. In this tutorial first we'll build the node workflow, and then we'll see some of the use
cases that such a workflow can actually have in a production environment. As always, you'll be
finding the work flow files in the description as well as any model you might need for this.
And remember that if you're missing something from the workflow you can just install missing
nodes or missing models from the manager. And, if you don't have the manager installed, you
can go back and watch the first video where I explain how to do so. And that's about it for
the introduction, so let's go build some nodes. So, first of all, we’re going to set up the node
responsible for communicating with Photoshop. Double click and search for “Photoshop to
ComfyUI”. This node has a password field, which is going to be put into Photoshop later,
and for now it’s 12341234. It has an image output, which is the actual active image on Photoshop,
and since we might be working with larger images, we want to resize it to a resolution we can
work with on Stable Diffusion. So drag and drop, and search for a “Resize” node. We will input 1024
by 1024 for resolution, since it’s big enough to have enough detail, and it’s small enough
to still be fast for real time generations. If you don’t have some of these nodes don’t
worry, you can install them via comfyUI manager by going to manager > install missing nodes.
The next thing we need is obviously a model. Search for “Load Checkpoint” and select your
model of choice, in my case it’s going to be epicRealism. Then search for the “Load
VAE” node to load up a VAE if necessary. Since we want to be able to generate in
real time while we’re working on Photoshop, we want to work with LCM, which offers faster
generation times than regular models. Search for the “Load LORA” node, and select your LCM model.
By the way, you will be able to find the links to all these models in the description below.
Next up is the “KSampler” node. Search for it and set it up, then search for the “VAE
Decode” node and the “Preview Image” node which, although not strictly necessary in this workflow,
offers some debugging insights if needed. These are the main inputs and outputs nodes, but
what goes on in the middle of the workflow? Well, let’s start by connecting the “Load Checkpoint”
model output to the “Load LORA” model input, and connect the “Load LORA” model output to the
“KSampler” model input. This will make sure that the model is loaded, passes through the LCM
lora, and is then loaded into the KSampler. Next set up the KSampler settings. Since we’re
using an LCM model, we want to have a lower than normal number of steps and CFG, so let’s put 8
steps and 1.4 CFG. Set the denoise to a starting setting of 0.4, and the sampler_name to “LCM”.
Then, we connect the CLIP output from the “Load Checkpoint” node to the CLIP input in the “Load
LORA” node, and create our “CLIP Text Encode” nodes from the “Load LORA” node CLIP output.
Connect the conditioning outputs from the “CLIP Text Encode” nodes to the positive and negative
inputs in the KSampler. These will be our positive prompt text box, and our negative prompt text box.
Then we connect the VAE output from the “Load VAE” node to the VAE input in the
“VAE Decode” node. On the same node, connect the Latent output from the KSampler to
the samples input. Then connect the image output to the images input in the “Preview Image” node.
We’re now missing a latent image. Since we want to be working with pre-existing images sourced
from Photoshop, we’ll be working on an image to image workflow. Search for the “VAE Encode”
node, hook up the image output from the “Image Resize” node to the image input, and the same
VAE we’re using to decode to the VAE input on the “VAE Decode” node. Then hookup the latent
output to the latent_image input on the KSampler and we’re done. Or, well, almost done, since
we need a way to display the resulting images somewhere else than the comfyUI webpage. Search
for the “PreviewPopUp” node, and hook it up to the image output in the “VAE Decode” node.
This node will create a floating window that we can place beside our Photoshop workspace
displaying our generated images in real time. Now, for the prompts. For this first
experiment, we’ll do stick figures, so we’ll start with a generic prompt like “photo
of a young woman standing on a beach in front of the ocean”, and some negative prompt words
we don’t want to see, like “illustration, 3d, render” and not safe for work related stuff.
Now that the node chain is completed, we want to head over to the right hand
side menu, click on “Extra Options”, and click on “Auto Queue”. This will keep the node
chain going once started. We can also set it up so that it’ll only queue when it detects a change, or
we can set up the “Photoshop to comfyUI” node in the same way too. I’ll keep false for both, since
I want it to be generating images continuously. Everything is ready, so let’s head
over to Photoshop. Click on “Edit”, and navigate to “Remote Connections”.
Click on it, and locate the “Enable Remote Connections” field. We want to click
it, and input the same password we set in comfyUI in the “Photoshop to comfyUI” node,
like 12341234. Go set up your password of choice on the node, copy it, and paste it into
the correct box in Photoshop. Once that’s done, click ok, and the connection should be
established. In order to check that, we can just hit the “Queue Prompt” button inside comfyUI
and the generation should start by sourcing from the blank image on Photoshop. Once the generation
is done, a popup will, well, pop up, and that’s our “PreviewPopUp” node working. We can place it
where we want it on our screen, then click on the pin button and it will stay there even if we click
somewhere else or switch programs and windows. Now to actually draw something. We’ll
start by drawing a crude human figure, and as you can see the popup is being refreshed
as soon as a generation process is done, which is quite fast for now. Even
if we keep adding details, though, the generated image is quite the same as the
one we’re drawing, and that’s because we have a denoise value that is quite low. So, let’s try
to up it to something like 0.8. As you can see, now we are generating images that aren’t crude
drawings anymore! But since the generated pictures are quite random, we might want to tune in
the settings for the KSampler a bit more, so let’s tinker with the denoise value until we
find the correct parameter for what we want. In this case, the outline is there, but the
picture is in black and white. We can then try to fill in the white spaces with color.
Like a blue for the sea, a brown for the sand, a light blue for the sky. We can also tinker a bit
more with the denoise value, like in this case, where we went down to 0.65, then 0.7. As you can
see, the generated pictures are now much closer to the original, and if we add more details, the
generated pictures will be more detailed too. The face will be in the correct place, the clothes
will be similar to what we drew, et cetera. But this is just the beginning. Let’s start
with a picture of a model. In this case, we want to dress it up. But first, as you can
see, the generated picture is struggling with the concept of depth. In order to fix that,
we can set up a ControlNet. Search for “Apply ControlNet (Advanced)”, then hook up the
conditioning nodes so that they will go through the “Apply ControlNet” node. Since
the generations are going on right now, we’ll finish linking them later. Then we need to
search for the “Load ControlNet Model” node, where we’ll look for the depth model. You’ll find it in
the description below, or you can search for it in your comfyUI manager “Install models” tab. Hook up
the Control_Net output to the Control_Net input, and now we’re only missing an image. First, we
need a way to extract a depth map from an image. We’ll search for the “MiDaS Depth Map” node.
But we don’t want it to process a 1024 image, so we’ll duplicate the “Image Resize” node and
set it to 512 by 512, and hook up the resized image to the “Depth Map” node, and the resulting
depth map to the “Apply ControlNet” node. We will then close the circuit by hooking up the positive
and negative outputs and inputs. Aaand we get an error, because I forgot to hook up the new “Image
Resize” node with the “Photoshop to comfyUI” node, so there’s no source to resize. Let’s fix that
and hit “Queue Prompt” again. As you can see, the generation is a bit slower now, but the
depth of the generated picture is much better. Going back to Photoshop, I will pull up some
clothes. The popup will refresh with an image of a model with a sweater on top, and will
try its best to make sense of it. The light is kind of coherent, and it warped it to follow
one of the arms too. But since we would like a better approximation, we can try masking the
sweater and warping it so that it fits the body a bit more nicely. Keep in mind that while
we’re warping, the preview will not update, since the underlying image is still “unchanged”.
But as soon as we confirm the warp, the resulting image will be a lot better than what we have in
Photoshop. We can do the same for a skirt now, and as you can see it’s already quite good,
even adding some sort of pockets on the side. If we mask and warp the skirt, it’s going
to result in an even better generation. If we don’t want a pocket, we can just
mask in the corresponding hidden hand. So, the generated image is adding coherence to
the rough post processing we’ve done, all in real time. Quite good to use in pre-visualization
stages of productions if you ask me. The clothes, though, are a bit different from the
reference still. This is because we’re using LCM. We can try and switch it off and use a normal
sampler, sacrificing some speed in the process, and see if that works too. In order to do
that, right click on the “Load LORA” node and select “Bypass”. This will result in the
chain still working, it’s just not going to be affected by the Bypassed node at all.
Then, we can select a more usual sampler, like “dpmpp_2m”, and a more realistic scheduler,
like “karras”. If we resume the generations now, though, we can see that the resulting image
is quite bad. That’s because, while LCM can work welll with a low number of steps, normal
samplers just can’t. So let’s go up to 14 steps, and set the CFG to 2, and generate again. A
little better, but the denoise value is still too low to notice any huge difference. Let’s up
the steps to 18, CFG to 4, and denoise to 0.4, and try again. The results now are much better,
although the speed is not quite the same as before. It’s a tradeoff we need to be wary of.
At this point, we can keep tinkering with the values until we find the parameters that work the
best for us depending on the situation at hand. But if we’re doing previsualization, why stop
at clothing? We can also insert a background, like a forest, or an interior. The generated image
will be a lot more coherent than the mashup we set up in Photoshop, and with minimal effort. These
are just some examples of what we can do in terms of previsualization, but it’s a technique that can
help in production environments. The only thing that’s not really working, though, is the shadow,
and that can be taken care of with some post work. Let’s check out one more possible application.
We’re going to generate some fashion illustrations, starting from this picture.
Let’s set everything back to the LCM settings, and we will change the prompt to reflect what we
now want. So we’ll write “Fashion illustration of a young woman” as positive prompt, and we’ll
remove “illustration” from the negative prompt, while adding “paper” so it doesn’t look like it’s
drawn on paper. We will then up the denoising value to 0.8, since we want a very different style
from the original image, and since we don’t care for depth anymore but we want the edges to
be consistent, we’ll switch the ControlNet model from depth to canny. Now the “Depth Map”
node is not needed anymore, so we’ll delete it, and in its place, we’ll search for the “Canny
Edge” node, which will preprocess the Photoshop image’s outline. Hook it up, and we’ll add
“watercolor” as a style to the positive prompt, since we’d like to get some watercolor images.
Now let’s hit “Queue Prompt”, and going back to photoshop we can see how we can get some
fashion illustrations effortlessly. The colors and the overall style is different,
since we have a very high denoise value, but this lets us influence the overall design by
painting over the original image if we want to, and get some different results very quickly. So on
the one hand, it’s not precise, but on the other, it lets us brainstorm new ideas faster. It really
depends on what stage of preproduction you are at. If we want to have more precise results,
we can always tinker with the denoise and the other parameters, but this will be resulting
in more or less realistic images. It’s up to you to find the combination of parameters
that work for you, and to apply the correct workflow to your - or your clients’ - needs.
That's it for today. As always - you know the drill - please like and subscribe, as it helps
a lot since this is a new channel. I'm Andrea Baioni and you can find me on Instagram
at risunobushi or at andrebaioni.com. Also please comment if you have any questions
or suggestions for this or the next videos. See you next time with more workflows that can be
applied in production environments for creatives.