Hello beautiful people, and welcome to what may
seem like a very click-baity episode of Stable Diffusion for Professional Creatives. And well,
let me tell you that as a fashion photographer by trade, I would love for this all to be clickbait,
but I will show you how we can get from the shot on the left, taken with an iPhone, to the shot in
the middle left, where we get a new background, to the shot in the middle right, where we get
a re-lit picture, to a final shot on the right, where we get a re-lit picture with a new
background while keeping fine details such as the text here — all with one click. This is
the workflow that we'll be looking at today. Now, I already hear you asking, "Where's
the workflow?" Well, the workflow is in the description below. You can download it,
and you can find every model you need as well as any custom nodes you need in the description
below, same as always. But since this is not an easy-to-use workflow, please bear with me for a
sec while I go through all of it and explain it to you. If you don't want to do that, there are
plenty of notes left there, so suit yourself. But don't come back crying when you use the
wrong model inside of the IC-Light Group. I am sure that most of you by now are familiar
with IC-Light, something that came out a few weeks ago and helps us relight pictures. If you
followed all my videos from the previous week, you are kind of familiar with this workflow,
at least with the part up here. I presented this workflow last Monday, and it was basically a
way to relight a product image, something that I felt like would be a game-changer in product
photography. Then, during a live session mid last week, I was challenged by a viewer
to keep the finer details — in his case, that was a bottle of whiskey — and
preserve them in the final generation. During that live stream, I said that a good way
to do it would be to use a frequency separation technique in Photoshop so that we wouldn't need
to spend any time in ComfyUI doing weird things to get the same result. Well, what changed then
during these last few days? During the weekend, a Reddit user, Powered_JJ, posted on Reddit
a solution for implementing the frequency separation technique inside of ComfyUI
natively. Now, I got in touch with them, and with their help, we managed to find
a solution to integrate the details from the original picture into the generated
and re-lit picture all inside of ComfyUI. If you look at the results here side by side,
you can see that the one on the left is the one we had right before we could implement a frequency
separation technique directly inside of ComfyUI, and the one on the right is the one we are getting
now with the frequency separation technique inside of ComfyUI. All the text detail is much better,
as well as any other detail, really. More importantly, it is the same product that we are
trying to sell to potential customers. This is invaluable, of course, in product photography,
where you're not selling a 90% replica but the actual product you are trying to sell. The
ability to have coherent lighting and the ability to have the actual product in your shots were some
of the major obstacles up until now in generative AI in production environments, and during
this weekend, we blew them out of the water. If you are a photographer, a creative
director, an art director, or a designer in the photography space and you don't see the
potential here — and it's not even potential; it's actual use cases — then I don't know
what to tell you because I've had tons of people over on Instagram, and I’m talking
editors at Vogue, asking me what the heck we just did. So pardon my excitement,
but to me, this is a major breakthrough. But let's not get ahead of ourselves and actually
try to understand what's going on in this workflow that I developed. Let's start from the beginning.
What we're starting here with in the top left corner is an iPhone shot of a product — a very bad
shot at that. It then gets resized to 1024x1024. Here in the top center-right, we have a group that
is currently not working, generating a product from scratch. We don't need that since we're
not generating products that don't exist, but I left that just in case you want to test it when
you don't have any shots you want to work with. Right below, we have three different groups. The
background generator group here generates a new background based on a prompt, using the depth and
the lineart of the original image. Now, why the line art, you say? Wouldn't the depth be enough?
Well, I found that in some cases with transparent stuff, it's better to have some lineart as well.
You can disable that if you don't want to use it. After the background gets generated, it gets
passed onto a "Blend Original Subject on Top of Background" group. This group merges the
original subject of the original picture on top of the generated background by employing
a "Segment Anything" group. In my case, I have to specify that I want a bottle here because
a bottle is the subject of my picture. Now, the merged picture would be rather bad, at least
for product photography standards, and this is kind of the point we were at about a month ago.
We could get good enough shots, but they were not convincing because of lighting and because the
subject looked like it was copy-pasted on top. What we can now do with IC-Light, which is a
fantastic piece of tech — I'm not even kidding; I think it's the best thing to come to
generative AI since LORAs — is we can relight the picture we just generated. Let's
see how it works. In this relight group, we are sourcing the resulting image that is
the blend of the original image and the new background. Then we are sourcing the mask that
we get from the white spots. We are relighting the image by using those white areas in
the generated image as sources of light. Now, this is not always great, so what we can
do instead is use the this load image mask node, which is right now bypassed, and
link it up to the grow mask with blur node. It would still kind of be a
one-click solution, but I wanted to have everything work together without the need for
any external help. The image gets relit now, but the details are not great. So this relit
picture gets passed through an “(Optional) Preserve Details Such as Words from the Original
Image" group. Now, that's a catchy name, I know, but I needed it to be self-explanatory
because I don't know who's going to try their hand at this workflow. The more notes and
the easier to understand, the better, I guess. What this node does — and bear with me because
it gets a bit mathematical and convoluted — is one thing only, really. In this green subgroup of
nodes in the top left corner, it prepares the mask from the segment anything group for later use.
Then it uses a frequency separation technique, which separates a high-frequency layer (taking
care of details) and a low-frequency layer (taking care of color and lighting), and
applies that technique to the original image and the relit image. It does all that
based on some math that works in most cases, but your mileage may vary, so you might need to
fix things up a bit, such as the blur radius, because you might need to retain some details
that are finer than the ones I'm working with. The last step that we developed through the
weekend is creating a high-frequency layer that is a blend of the high-frequency layer from the
original image and the one from the relit image. It's a very precise blend because the blend
we want is one that has all the details of the original subject on everything but the details of
the generated subject. How do we do that? We use the mask from the segment anything group that we
prepared earlier as a mask for an image blend by mask node, taking in both of the high-frequency
layers. Then, the only thing we need to do is use the resulting high-frequency layer (the blend of
the two previous high-frequency layers) and merge it on top of the low-frequency layer from the
generated relit image. Here we have the result. Hoping that you're still following me and you're
not seeing this as the rambling of an old man, let's demonstrate that all of this works, I'm not
just making things up and I'm not cherry-picking images. I am going to get this image of a Gucci
bag over here and change the prompt to "a Gucci bag on the water." Copy that prompt over here
in the positive prompt for the relight group, and then change the prompt from "bottle" to
"bag" inside of the groundingDino node. I'm going to hit "Queue prompt", speed things up a
bit, and sit in silence without cutting anything. And there we go. The background might be just a
bit better, and you know this is not cherry-picked because the image is not that great, but the
details have been preserved, the light has been changed, and everything is working great. We
even got a tiny reflection here in the smudge of water. Now, I'm going to try again with different
things. I am going to cut to the results so that you don't have to actually wait. You already
know that I'm not cherry-picking. Let's go with a very bad picture of a microphone taken with my
iPhone. So let's change the prompt to "advertising photography of a microphone in front of a swirly
color background." Let's copy that, put that in the positive prompt for the relight group,
change from "white light" to "neon light," and change the prompt for the grounding Dino node from
"bag" to "microphone." Let's hit "Queue prompt." As you can see here, the grounding Dino
group hasn't taken into consideration the arm that is holding up the microphone, so I
might expect something changing in the arm, but we are going to focus on the mic. Here we got
the new background, and the arm has actually been changed. Now we got the relight, and here we
get the preserved details with the relight, all automatically done, all with one
click. It's completely insane to me. Now, if you want to test it yourself, there's the
workflow in the description below, as well as all the models you need. In the next segment, I'll be
talking about how this works a bit more in detail. So if you're not interested in all that mumbo
jumbo, you can just skip ahead to my conclusions, and I won't hold you accountable for that. I
have put a lot of notes into this workflow, so if you read them carefully, I don't think
there's any way in which you can mess up. So let's address how it all starts. This workflow
right now is working at 1024x1024 because IC-Light only works with 1.5. I feel like this is the
resolution that gets the best detail while still working kind of well. If we want to upscale,
we should probably do that later. So if you want to add an upscaling group, I feel like you
should do that later, more specifically between the relight group and the frequency separation
for keeping the details group. In that case, you would need to resize the mask from the segment
anything node as well that it's being used to blend the two high-frequency paths together.
So keep that in mind as well. Otherwise, those high-frequency paths will be masked with a mask
that is only 1024x1024, and you don't want that. Another thing that we have to take
into consideration is the fact that the segment anything group has
some limits to it. For example, if we wanted to get the whole mic with the
arm as well, we might need to tinker around with the prompt a bit, and we might also not
get there ever. So for complex scenes, you might need more than one segment anything group,
but for easy enough scenes where the subject is very clear and it's just one or two, you can
get away with just one segment anything group. Then there's the matter of this being a
one-click solution. There are better ways to achieve better results by tinkering around and
not making it just a one-click solution. In fact, I have provided you with options inside
this workflow as well, and that's why some of the preview image nodes are not preview
images; they are preview bridges. For example, if we don't want to have a mask that relights
the image being sourced from the white areas inside the generated background image, we
could open the mask editor and add the kind of lighting we want by drawing a mask inside the
preview bridge node. What we would then need to do is swap this mask from the color node for the
mask coming out of the preview bridge image node, hooking up this mask over here with the grow
mask with blur input. We would have a lot more control over how the light behaves. Now, the
light wouldn't be as organic as the one coming from the background, but we could actually
direct it in a way that is actionable by us. Another thing we could do is load up a mask that
we want and use that as a source of lighting. For example, if we only want strip lights coming from
the sides, we could create a strip light kind of mask and use that instead. Another thing we could
do to allow us to have more control would be to drop this segment anything mask that we are
using to automate everything as the source of finer details and instead draw ourselves on top
of the details we want to preserve on the relit image with the mask editor. So in this case, we
would go over the inputs here and the texture of the microphone head. This group of nodes
here would take care of preparing the mask, and we would just hook up this convert mask
to image image output into the image blend by mask mask input, substituting the
mask from the segment anything group. So if you hit "Queue prompt," you can see it
is taking into consideration only the parts we used. The last thing you want to be wary of,
although I tested this with a lot of different images and it kind of works every time, is that
the math here in the frequency separation groups is set for the average use case. If your
original picture is far from average for lighting conditions or the kind of details you
have or you want to keep more details around, you might need to tinker around with the math
here, which would mostly involve tinkering with the image Gaussian blur radius. All of this is
basically an approximation of what Photoshop does when using a frequency separation technique.
So if you are already familiar with that, you know how that works. But if you want to keep
finer details, you want to lower the radius a bit, and if you want to keep fewer details,
you want to up the radius a bit. All of these values have to be the same for
every group; otherwise, it won't work. Obviously, this workflow works with shots not
taken by a phone as well, and better shots, of course. So if we load up a good product shot
of a whiskey bottle here, change the prompt to reflect that, let's say "a whiskey bottle on
a river," copy that, put that over here in the relight prompt, and let's say "moonlight," let's
see what happens. Change the groundingDino node to reflect that we want a bottle as a subject.
There we go. We get a very good shot with just one click. So product photographers, not everything's
over yet. Starting from a better studio shot definitely gets you better results, but what
amazes me really is the organic progression we get from the starting shot in the top left, the
background generation, the blended generation, the relit generation, and the detail preservation
generation. To me, this is completely insane. Now, I know what you're all thinking: does it
work with people too? Yes, it does, but I need some more time for testing. Why is that? Well,
that's because for harsh lighting conditions, the high-frequency layers keep a lot of shadow
and light information in them, so I need to tinker with things before I feel like I can release
a good working workflow for people as well. that's because these workflows have become
so complex that if I started from scratch, this video would take like 40 minutes. If you
want to see more about how we came to this, you can look at the live stream that I've linked
here. That's like a 2-hour live stream where I explain my train of thought, so if you're
into that kind of thing, go watch that. Well, as far as the tech goes, I don't know right
now if this is even better than LORAs. LORAs are amazing, don't get me wrong, but this solves
so many issues that are actually usable in the real world. Yes, LORAs may solve an issue, but
they're not one-click solutions. Whereas this, this is a one-click solution to location shoots
or complex studio shoots. If I were a photographer —and I am— I would be really worried about this.
Or, I could get really excited and get to know the tech better and use it. So I guess the choice
is yours. I hope you get to try it out. I hope you get to have some fun with it. I hope you break
things. I would like to hear some feedback on it. Once again, thanks to Powered_JJ over on
Reddit, who developed this group of nodes for frequency separation techniques. Well,
that's it for today. If you liked this video, leave a like and subscribe. My name
is Andrea Baioni. You can find me on Instagram at risunobushi or on the web
at andreabaioni.com. Same as always, I'll be seeing you next week, scared
and confused about what's real anymore.