Hello everyone and welcome to another episode of
Stable Diffusion for Professional Creatives. Today we're going to take a lot of what we learned in
previous videos and turn this very rough Blender compositing into good interesting generated
pictures and more importantly art directionable. We are going to start with a very simple scene
like this one and we'll move on to a more complex scene like this one and we're going to end with
a very complex scene like this one in order to generate images such as these while maintaining
the ability to work independently on the single elements that compose these images, such as the
subject, the backgrounds, the objects inside, the camera angles, or anything really that an
art director or a client may ask us to change. Anyone who's ever tried to generate on commission
knows that clients and our director are used to tweak little things here and there. And that's
something that Stable Diffusion is not really great at doing. We cannot tell Stable Diffusion to
slightly change the camera angle, or we can't tell Stable Diffusion to turn the subject slightly
towards the left. And since, well, I'm used to doing all those things in a studio or on a
set, I thought, well, what's the closest thing to a set that we can use in order to prepare our
images for stable diffusion? And well, what are 3D programs if not a virtual set of sorts? Now, same
as always, you will be able to find this workflow in the description below, along with any nodes or
models you might need. At the end of the video, I will tell you about how this workflow works
and what's it all about. But for now, you just need to know that it's basically two different
workflows tied together. So in order to work in a 3D environment, while having the option of work
on each little thing independently, I realized that I could use a technique that is nothing new
really. It's been used in animations like Dsny films since decades ago. What I'm basically doing
here is just splitting the subject, an object, the mid ground and the background into different
planes, which is basically the same as something like this. And in order to have the files ready,
I cut out everything that I didn't need from these pictures. So in this case, our main subject is a
cutout of the subject with everything else being transparent, our chair here received the same
treatment, the windows in this lounge have been rendered transparent, and the background doesn't
need transparency, well, because that's the background. And if we hop over to our camera view,
we can see that everything works together. Now, you might say, well, you could do that in
Photoshop as well, there's no need to do this in Blender. And well, yes, that is definitely
true. But that is only true in this case, because in this case, we're only using planes.
We'll see later how we can use actual meshes in order to work on more complex environments.
But in this particular case, what we can do is just select, for example, the planes that we
want, in this case, the chair and the subject, and resize them at will, place them wherever we
want to. We can resize the background. We can move it around., place them wherever we want to. We can
resize the background, we can move it around, we can do anything we want. More importantly,
we can do anything that a client or an art director might want without resorting to
masking, without resorting to in painting, without resorting to all that kind of stuff.
And the great thing about this workflow, this approach, is that we don't even need to care
too much about comping correctly, about having the right lights, about having the right kind of
depth, because regenerating this composition will take care of all of that. Now jumping back into
ComfyUI, long-time viewers will recognize this workflow. It's very similar to my SDXL Lightning
from Blender to Generative AI workflow. Now while we could use SDXL Lightning this time as well,
because we are using two different control nets, one for Depth and one for Canny, depending on
your hardware you might get faster results with SDXL Lightning or 1.5. In my case I am using a
1.5 model because I find that with my particular setup it is a bit faster and also later we will be
using an IC light group and IC light uses 1.5 so loading up two different models one SDXL lighting
and one 1.5 is going to put a bit more strain on your hardware. Now the core of the workflow is the
screen share node. The screen share node allows us to screen grab from any window. In this case it's
grabbing the screen from the viewport inside of Blender, and the area that it's been grabbing
it's been set to the camera. So this is going to be our source image. Then we are getting our
depth and our candy map extracted from the image, and they're being applied through two
different control nets. And as a first step we are generating an image at a lowish denoise at
0.45, explaining the kind of image that we want to get. In this case, an advertising photo of a
well-dressed man with a tailored suit, sitting in a luxury living room, mountains in the background.
And we get a decent enough result. The issues that we had with the rough comping are being taken
care of, but the lighting is still kind of off, so what we could do is enable this other group
that takes care of relighting. If we hit Queue prompt again we can see that the lighting in the
intermediate stage is much better and the lighting in the last stage is a lot better and all from
a very rough comping inside of Blender. Now here there might be an issue. Let's say the client
wants the subject to remain the same because let's say it's a fashion client and I want to sell
the clothes that the model is wearing. Well since we are using two control nets the coordinates of
the subject both in the original picture and in the generated picture are going to be the same.
So what we could do is enable this intermediate group as well which is a segment anything group
that grabs the original subject from the original image and place it on top of the generated image.
And if we go check out the relighting results, we can see that the subject has remained the
same. Now, a great thing about the screen share node is that it keeps on doing its thing while we
work inside of Blender. So what we could do is go back inside of Blender, bring up a pop-up node
that has been appended at the relighting stage, place it anywhere on the area and begin working on
our picture depending on the client's directions. In this case, let's say the client wants the
subject to be closer to the camera. Now how would you do that inside of Stable Diffusion
only? Well, good thing we are doing it with Blender and we are resizing it for example and
bring it very very close placing it right here for example in the center of the frame and we are
going to wait for the screen share node to do its thing you just need to hit live run this way the
screen share node will continue working in the background and here we get the result now it's not
exactly live but it's really quite fast. Or let's say that the client wants to flip the room. We can
do that while in the background the workflow will take care of that and will display it right here
inside of Blender. All automated, all really fast, not really real time but not not real time either.
It all kind of depends on your hardware. Now all of this was done with simple planes and the way
these planes work is by adding a shader that is just a normal principal BSDF with an image texture
appended onto it. The color is going into the base color and the alpha is going into the alpha.
This way it's taking our original image and the transparency of it. But what if we wanted to have
more precise control with more complex scenes? For example, what if we wanted to have a mesh? A mesh
would allow us to create a more complex scene and to move inside and outside of that mesh as well.
Like in this case where I've recreated this sci-fi scene starting from this starship over here and
adding characters to it. In this case, in order to proceed, we want to fix the positive prompt,
of course. So we'll go for advertising photo of a sci-fi movie, mecha robots on patrol on a steel
spaceship, desert planet, sunset, for example, something really simple. Then we'll just have to
check that our area is set correctly and what we want to do is just hit live run and as we can see
we start getting our results. Now in this case I don't care too much about our subjects being the
same in the generated image as well so I'm just going to bypass these merge original subject group
but let's say our art director doesn't like this robot over here so let's move that out of the
way. And they also wanted to see the inside of the ship a bit more. So what we want to do is just
turn around the scene a bit, reframe the camera, move our subject robot a bit further inside.
Maybe we want to lower the denoise a bit, let's say to 0.3. And as you can see, we can follow
the directions of our client over our director of whomever very very easily and this is great
because otherwise if I had to try and randomly get the shot that the client wants that the art
director wants I don't know how much time I would need in order to do that so now we've seen how
to work with simple planes only with some planes and a single mesh let's now check out how we can
work with the whole environment being a mesh and our actors, our subjects being planes. Now in
this case I have found this environmental mesh that would allow us to place our subjects however
we want. The only thing that is lacking is a sky. So I'm going to bring that in as a plane and if we
hop into our camera view we can see that we have all our actors placed inside of the scene and we
can see how this is a very rough composite but we don't need to be precise anywhere else that is not
the camera viewer because stable diffusion is not gonna see anything else but that and we just need
that to look good-ish. The generations and the relighting are gonna take care of everything else.
Now in this case we've got a very messy scene, a battle scene, so we want to go back to our
comfyUI, we want to be sure to change our positive prompt field accordingly so advertising
photo of an action movie Vikings engaged in a tragic battle in this case and since I changed
the aspect ratio of my viewer I want to set the area again and by using the pop-up preview node
we can see how the scene is coming together. Now once again we can change the camera angle, we can
change the position of our subjects. For example we can switch the camera angle over to here, let's
flip the camera, let's move our actors. The client might say, oh, well, I don't like this guy over
here. Let's move him to the back and let's bring this one on the front. I like him very much.
And we can do that on the fly without having to worry about random generations, without having
to worry about saying, oh, I'll have to do that later because I can't do it now. And while we've
seen this working for people and environments it can also work with products, it can also work with
anything really, it can also work with animations. And we don't have to worry about our subjects
getting changed too much during the generations because the positioning of the subject is going
to be the same in the original pictures as well so we can just swap it out and merge it for
the original one. This is one of the most art directionable and easiest ways that I've found to
provide clients with the ability to change things on the fly. Let's do a bit of a deep dive on the
workflow. Long-time viewers of the channel will recognize this workflow as a very similar workflow
to the one that I used for generating in real time while using Blender. In that workflow, I used SDXL
Lightning, whereas in this case, I am using a 1.5 model. That's epicRealism. Which one to choose
is completely up to you. In my case, I'm using 1.5 because with my setup and while recording,
I've found that it's a bit faster, but your mileage may vary. The core of this workflow is the
screen share node. The screen share node allows us to screen grab from any windows. In our case we
are screen sharing from Blender. We just have to select share a screen, pick our Blender window and
then set our area. In this case the area is the camera view and by knowing the aspect ratio of our
camera view we can then resize our image. In our case, we are resizing to 1920 by 1080, but if you
want to be faster, you can use lower resolutions. Then we have two preprocessors. Since our images
are very depth intensive, I picked a ZOE Depth Anything preprocessor, and that's because it
has an outdoor mode, and that's very great for this kind of complex scenes. We also have a
canny edge preprocessor for the outlines. These two pre-processed images get passed on to two
control nets. The depth one is at strength 1 because we really want depth to be just like that
and the canny control net is set at strength 0.1 because while I kind of care about outlines, I
don't care about it as much as I do with depth. Then we have the usual our positive and negative
prompt fields and we have our case sampler. Now depending on you using an SDXL lighting model or
a 1.5 model your setting is going to be different. In my case for 1.5 I'm using 20 steps at CFG 6
with scheduler Keras. If you're using a lighting model you are probably going to use 8 steps CFG
between 1 and 2 and SGM uniform as a scheduler. Then we have our denoise value. Depending on how
much we want the generated image to be closer or further away from our composition instead of
Blender, we have to set a denoise. In my case, I want my resulting generated image to be as close
as possible, so I've set it to 0.3, but anything really could work, it's all up to you. Regardless
of the denoising value, even if you put it 1, the overall subject is going to stay in the same
spatial coordinates because we have two control nets taking care of that. Then we have an optional
merge original subject group. This group is a very simple one and uses a segment anything group
to isolate our subjects and place them on top of the generated image. Now in this case we have
the mecha robots because we didn't use it for this one but if we start a generation we'll see that
it works basically in the same way. Then we have our relight group my white whale or at least
a trimmed down version of it. It's basically a simplified version of my latest relighting
workflow the one that takes care of color matching and detailed preservation. Instead of
having four different options for color matching, we just have one, and it is the one that I like,
the fourth one. But you can pick any other one, really. You just need to copy and paste from
that workflow. In this part of the workflow, we have to use a 1.5 model, regardless of whether
or not you were using a lightning model before. So we are loading up our checkpoint and our IC light
model. We are extracting a light map from our generated image based on white values. That means
that the areas that are closer to white are going to be the source of our light. Then our image gets
released and as you can see it's not great right now. And that's because the colors are shifted
and the details are kind of different. So what we want to do is go through all of this frequency
separation and color merging part, which takes care of all of that. The original image and the
re-lit image get split into a high frequency and a low frequency layer. The high frequency layer
from the original picture holds the details, while the low frequency layer, which holds colors
and light infos, is merged and averaged between the original and the re-lit picture. And on
the right here at the end we have our result. We have a remap image range node which takes care
of remapping whites and blacks in case our image turns out too dark. In this case I'm remapping
the white point because it was too dark. And then we have a preview pop-up node appended to the
resulting image so that we can use that inside of blender to have a live preview without switching
windows so all in all i think that this workflow is very versatile in directing tiny little things
and even huge things really be it camera angles or positioning of actors and subjects or tweaking
little insignificant little things like art director so much love to do while standing over
your shoulder and say not too sure about that can we change that and i thought about this workflow
because recently i had a couple of clients that have asked kind of the same thing they wanted to
be able to control a lot of different things in very difficult sets and this was the easiest way
that i came up with in order to be able to deliver all of that attention to detail. So I hope this
is as useful to you as it is to me and I hope you had some fun and learned something new. This is
gonna be it for today. If you liked this video, please leave a like and subscribe. My name is
Andrea Baioni, you can find me on Instagram at risunobushi or on the web at andreabaioni.com and
same as always, I will be seeing you next week.