Relight and Preserve any detail with Stable Diffusion

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello beautiful people, and welcome to what may  seem like a very click-baity episode of Stable   Diffusion for Professional Creatives. And well,  let me tell you that as a fashion photographer by   trade, I would love for this all to be clickbait,  but I will show you how we can get from the shot   on the left, taken with an iPhone, to the shot in  the middle left, where we get a new background,   to the shot in the middle right, where we get  a re-lit picture, to a final shot on the right,   where we get a re-lit picture with a new  background while keeping fine details such   as the text here — all with one click. This is  the workflow that we'll be looking at today. Now, I already hear you asking, "Where's  the workflow?" Well, the workflow is in   the description below. You can download it,  and you can find every model you need as well   as any custom nodes you need in the description  below, same as always. But since this is not an   easy-to-use workflow, please bear with me for a  sec while I go through all of it and explain it   to you. If you don't want to do that, there are  plenty of notes left there, so suit yourself.   But don't come back crying when you use the  wrong model inside of the IC-Light Group. I am sure that most of you by now are familiar  with IC-Light, something that came out a few   weeks ago and helps us relight pictures. If you  followed all my videos from the previous week,   you are kind of familiar with this workflow,  at least with the part up here. I presented   this workflow last Monday, and it was basically a  way to relight a product image, something that I   felt like would be a game-changer in product  photography. Then, during a live session mid   last week, I was challenged by a viewer  to keep the finer details — in his case,   that was a bottle of whiskey — and  preserve them in the final generation. During that live stream, I said that a good way  to do it would be to use a frequency separation   technique in Photoshop so that we wouldn't need  to spend any time in ComfyUI doing weird things   to get the same result. Well, what changed then  during these last few days? During the weekend,   a Reddit user, Powered_JJ, posted on Reddit  a solution for implementing the frequency   separation technique inside of ComfyUI  natively. Now, I got in touch with them,   and with their help, we managed to find  a solution to integrate the details from   the original picture into the generated  and re-lit picture all inside of ComfyUI. If you look at the results here side by side,  you can see that the one on the left is the one   we had right before we could implement a frequency  separation technique directly inside of ComfyUI,   and the one on the right is the one we are getting  now with the frequency separation technique inside   of ComfyUI. All the text detail is much better,  as well as any other detail, really. More   importantly, it is the same product that we are  trying to sell to potential customers. This is   invaluable, of course, in product photography,  where you're not selling a 90% replica but   the actual product you are trying to sell. The  ability to have coherent lighting and the ability   to have the actual product in your shots were some  of the major obstacles up until now in generative   AI in production environments, and during  this weekend, we blew them out of the water. If you are a photographer, a creative  director, an art director, or a designer   in the photography space and you don't see the  potential here — and it's not even potential;   it's actual use cases — then I don't know  what to tell you because I've had tons of   people over on Instagram, and I’m talking  editors at Vogue, asking me what the heck   we just did. So pardon my excitement,  but to me, this is a major breakthrough. But let's not get ahead of ourselves and actually  try to understand what's going on in this workflow   that I developed. Let's start from the beginning.  What we're starting here with in the top left   corner is an iPhone shot of a product — a very bad  shot at that. It then gets resized to 1024x1024.   Here in the top center-right, we have a group that  is currently not working, generating a product   from scratch. We don't need that since we're  not generating products that don't exist, but I   left that just in case you want to test it when  you don't have any shots you want to work with. Right below, we have three different groups. The  background generator group here generates a new   background based on a prompt, using the depth and  the lineart of the original image. Now, why the   line art, you say? Wouldn't the depth be enough?  Well, I found that in some cases with transparent   stuff, it's better to have some lineart as well.  You can disable that if you don't want to use it. After the background gets generated, it gets  passed onto a "Blend Original Subject on Top   of Background" group. This group merges the  original subject of the original picture on   top of the generated background by employing  a "Segment Anything" group. In my case, I have   to specify that I want a bottle here because  a bottle is the subject of my picture. Now,   the merged picture would be rather bad, at least  for product photography standards, and this is   kind of the point we were at about a month ago.  We could get good enough shots, but they were not   convincing because of lighting and because the  subject looked like it was copy-pasted on top. What we can now do with IC-Light, which is a  fantastic piece of tech — I'm not even kidding;   I think it's the best thing to come to  generative AI since LORAs — is we can   relight the picture we just generated. Let's  see how it works. In this relight group,   we are sourcing the resulting image that is  the blend of the original image and the new   background. Then we are sourcing the mask that  we get from the white spots. We are relighting   the image by using those white areas in  the generated image as sources of light. Now, this is not always great, so what we can  do instead is use the this load image mask node,   which is right now bypassed, and  link it up to the grow mask with   blur node. It would still kind of be a  one-click solution, but I wanted to have   everything work together without the need for  any external help. The image gets relit now,   but the details are not great. So this relit  picture gets passed through an “(Optional)   Preserve Details Such as Words from the Original  Image" group. Now, that's a catchy name, I know,   but I needed it to be self-explanatory  because I don't know who's going to try   their hand at this workflow. The more notes and  the easier to understand, the better, I guess. What this node does — and bear with me because  it gets a bit mathematical and convoluted — is   one thing only, really. In this green subgroup of  nodes in the top left corner, it prepares the mask   from the segment anything group for later use.  Then it uses a frequency separation technique,   which separates a high-frequency layer (taking  care of details) and a low-frequency layer   (taking care of color and lighting), and  applies that technique to the original   image and the relit image. It does all that  based on some math that works in most cases,   but your mileage may vary, so you might need to  fix things up a bit, such as the blur radius,   because you might need to retain some details  that are finer than the ones I'm working with. The last step that we developed through the  weekend is creating a high-frequency layer that   is a blend of the high-frequency layer from the  original image and the one from the relit image.   It's a very precise blend because the blend  we want is one that has all the details of the   original subject on everything but the details of  the generated subject. How do we do that? We use   the mask from the segment anything group that we  prepared earlier as a mask for an image blend by   mask node, taking in both of the high-frequency  layers. Then, the only thing we need to do is use   the resulting high-frequency layer (the blend of  the two previous high-frequency layers) and merge   it on top of the low-frequency layer from the  generated relit image. Here we have the result. Hoping that you're still following me and you're  not seeing this as the rambling of an old man,   let's demonstrate that all of this works, I'm not  just making things up and I'm not cherry-picking   images. I am going to get this image of a Gucci  bag over here and change the prompt to "a Gucci   bag on the water." Copy that prompt over here  in the positive prompt for the relight group,   and then change the prompt from "bottle" to  "bag" inside of the groundingDino node. I'm   going to hit "Queue prompt", speed things up a  bit, and sit in silence without cutting anything. And there we go. The background might be just a  bit better, and you know this is not cherry-picked   because the image is not that great, but the  details have been preserved, the light has   been changed, and everything is working great. We  even got a tiny reflection here in the smudge of   water. Now, I'm going to try again with different  things. I am going to cut to the results so that   you don't have to actually wait. You already  know that I'm not cherry-picking. Let's go with   a very bad picture of a microphone taken with my  iPhone. So let's change the prompt to "advertising   photography of a microphone in front of a swirly  color background." Let's copy that, put that   in the positive prompt for the relight group,  change from "white light" to "neon light," and   change the prompt for the grounding Dino node from  "bag" to "microphone." Let's hit "Queue prompt." As you can see here, the grounding Dino  group hasn't taken into consideration the   arm that is holding up the microphone, so I  might expect something changing in the arm,   but we are going to focus on the mic. Here we got  the new background, and the arm has actually been   changed. Now we got the relight, and here we  get the preserved details with the relight,   all automatically done, all with one  click. It's completely insane to me. Now, if you want to test it yourself, there's the  workflow in the description below, as well as all   the models you need. In the next segment, I'll be  talking about how this works a bit more in detail.   So if you're not interested in all that mumbo  jumbo, you can just skip ahead to my conclusions,   and I won't hold you accountable for that. I  have put a lot of notes into this workflow,   so if you read them carefully, I don't think  there's any way in which you can mess up. So let's address how it all starts. This workflow  right now is working at 1024x1024 because IC-Light   only works with 1.5. I feel like this is the  resolution that gets the best detail while still   working kind of well. If we want to upscale,  we should probably do that later. So if you   want to add an upscaling group, I feel like you  should do that later, more specifically between   the relight group and the frequency separation  for keeping the details group. In that case,   you would need to resize the mask from the segment  anything node as well that it's being used to   blend the two high-frequency paths together.  So keep that in mind as well. Otherwise, those   high-frequency paths will be masked with a mask  that is only 1024x1024, and you don't want that. Another thing that we have to take  into consideration is the fact that   the segment anything group has  some limits to it. For example,   if we wanted to get the whole mic with the  arm as well, we might need to tinker around   with the prompt a bit, and we might also not  get there ever. So for complex scenes, you   might need more than one segment anything group,  but for easy enough scenes where the subject is   very clear and it's just one or two, you can  get away with just one segment anything group. Then there's the matter of this being a  one-click solution. There are better ways   to achieve better results by tinkering around and  not making it just a one-click solution. In fact,   I have provided you with options inside  this workflow as well, and that's why   some of the preview image nodes are not preview  images; they are preview bridges. For example,   if we don't want to have a mask that relights  the image being sourced from the white areas   inside the generated background image, we  could open the mask editor and add the kind   of lighting we want by drawing a mask inside the  preview bridge node. What we would then need to   do is swap this mask from the color node for the  mask coming out of the preview bridge image node,   hooking up this mask over here with the grow  mask with blur input. We would have a lot more   control over how the light behaves. Now, the  light wouldn't be as organic as the one coming   from the background, but we could actually  direct it in a way that is actionable by us. Another thing we could do is load up a mask that  we want and use that as a source of lighting. For   example, if we only want strip lights coming from  the sides, we could create a strip light kind of   mask and use that instead. Another thing we could  do to allow us to have more control would be to   drop this segment anything mask that we are  using to automate everything as the source of   finer details and instead draw ourselves on top  of the details we want to preserve on the relit   image with the mask editor. So in this case, we  would go over the inputs here and the texture   of the microphone head. This group of nodes  here would take care of preparing the mask,   and we would just hook up this convert mask  to image image output into the image blend   by mask mask input, substituting the  mask from the segment anything group. So if you hit "Queue prompt," you can see it  is taking into consideration only the parts   we used. The last thing you want to be wary of,  although I tested this with a lot of different   images and it kind of works every time, is that  the math here in the frequency separation groups   is set for the average use case. If your  original picture is far from average for   lighting conditions or the kind of details you  have or you want to keep more details around,   you might need to tinker around with the math  here, which would mostly involve tinkering with   the image Gaussian blur radius. All of this is  basically an approximation of what Photoshop does   when using a frequency separation technique.  So if you are already familiar with that,   you know how that works. But if you want to keep  finer details, you want to lower the radius a bit,   and if you want to keep fewer details,  you want to up the radius a bit. All   of these values have to be the same for  every group; otherwise, it won't work. Obviously, this workflow works with shots not  taken by a phone as well, and better shots,   of course. So if we load up a good product shot  of a whiskey bottle here, change the prompt to   reflect that, let's say "a whiskey bottle on  a river," copy that, put that over here in the   relight prompt, and let's say "moonlight," let's  see what happens. Change the groundingDino node   to reflect that we want a bottle as a subject.  There we go. We get a very good shot with just one   click. So product photographers, not everything's  over yet. Starting from a better studio shot   definitely gets you better results, but what  amazes me really is the organic progression we   get from the starting shot in the top left, the  background generation, the blended generation,   the relit generation, and the detail preservation  generation. To me, this is completely insane. Now, I know what you're all thinking: does it  work with people too? Yes, it does, but I need   some more time for testing. Why is that? Well,  that's because for harsh lighting conditions,   the high-frequency layers keep a lot of shadow  and light information in them, so I need to tinker   with things before I feel like I can release  a good working workflow for people as well. that's because these workflows have become  so complex that if I started from scratch,   this video would take like 40 minutes. If you  want to see more about how we came to this,   you can look at the live stream that I've linked  here. That's like a 2-hour live stream where I   explain my train of thought, so if you're  into that kind of thing, go watch that. Well, as far as the tech goes, I don't know right  now if this is even better than LORAs. LORAs are   amazing, don't get me wrong, but this solves  so many issues that are actually usable in the   real world. Yes, LORAs may solve an issue, but  they're not one-click solutions. Whereas this,   this is a one-click solution to location shoots  or complex studio shoots. If I were a photographer   —and I am— I would be really worried about this.  Or, I could get really excited and get to know the   tech better and use it. So I guess the choice  is yours. I hope you get to try it out. I hope   you get to have some fun with it. I hope you break  things. I would like to hear some feedback on it. Once again, thanks to Powered_JJ over on  Reddit, who developed this group of nodes   for frequency separation techniques. Well,  that's it for today. If you liked this video,   leave a like and subscribe. My name  is Andrea Baioni. You can find me on   Instagram at risunobushi or on the web  at andreabaioni.com. Same as always,   I'll be seeing you next week, scared  and confused about what's real anymore.
Info
Channel: Andrea Baioni
Views: 7,423
Rating: undefined out of 5
Keywords: generative ai, stable diffusion, comfyui, civitai, text2image, txt2img, img2img, image2image, image generation, artificial intelligence, ai, generative artificial intelligence, sd, tutorial, risunobushi, risunobushi_ai, risunobushi ai, stable diffusion for professional creatives, comfy-ui, andrea baioni, stable diffusion experimental, sdxl, ic-light, ic light, relighting, 1-Click iPhone shot to AD ready image (IC-Light + preserve details) - SD for Professional Creatives
Id: 3N0vvmAoKJA
Channel Id: undefined
Length: 19min 2sec (1142 seconds)
Published: Mon May 20 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.