Multi Plane Camera Technique for Stable Diffusion - Blender x SD

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Hello everyone and welcome to another episode of  Stable Diffusion for Professional Creatives. Today   we're going to take a lot of what we learned in  previous videos and turn this very rough Blender   compositing into good interesting generated  pictures and more importantly art directionable.   We are going to start with a very simple scene  like this one and we'll move on to a more complex   scene like this one and we're going to end with  a very complex scene like this one in order to   generate images such as these while maintaining  the ability to work independently on the single   elements that compose these images, such as the  subject, the backgrounds, the objects inside,   the camera angles, or anything really that an  art director or a client may ask us to change.   Anyone who's ever tried to generate on commission  knows that clients and our director are used to   tweak little things here and there. And that's  something that Stable Diffusion is not really   great at doing. We cannot tell Stable Diffusion to  slightly change the camera angle, or we can't tell   Stable Diffusion to turn the subject slightly  towards the left. And since, well, I'm used   to doing all those things in a studio or on a  set, I thought, well, what's the closest thing   to a set that we can use in order to prepare our  images for stable diffusion? And well, what are 3D   programs if not a virtual set of sorts? Now, same  as always, you will be able to find this workflow   in the description below, along with any nodes or  models you might need. At the end of the video,   I will tell you about how this workflow works  and what's it all about. But for now, you just   need to know that it's basically two different  workflows tied together. So in order to work in   a 3D environment, while having the option of work  on each little thing independently, I realized   that I could use a technique that is nothing new  really. It's been used in animations like Dsny   films since decades ago. What I'm basically doing  here is just splitting the subject, an object,   the mid ground and the background into different  planes, which is basically the same as something   like this. And in order to have the files ready,  I cut out everything that I didn't need from these   pictures. So in this case, our main subject is a  cutout of the subject with everything else being   transparent, our chair here received the same  treatment, the windows in this lounge have been   rendered transparent, and the background doesn't  need transparency, well, because that's the   background. And if we hop over to our camera view,  we can see that everything works together. Now,   you might say, well, you could do that in  Photoshop as well, there's no need to do this   in Blender. And well, yes, that is definitely  true. But that is only true in this case,   because in this case, we're only using planes.  We'll see later how we can use actual meshes in   order to work on more complex environments.  But in this particular case, what we can do   is just select, for example, the planes that we  want, in this case, the chair and the subject,   and resize them at will, place them wherever we  want to. We can resize the background. We can move   it around., place them wherever we want to. We can  resize the background, we can move it around, we   can do anything we want. More importantly,  we can do anything that a client or an art   director might want without resorting to  masking, without resorting to in painting,   without resorting to all that kind of stuff.  And the great thing about this workflow,   this approach, is that we don't even need to care  too much about comping correctly, about having   the right lights, about having the right kind of  depth, because regenerating this composition will   take care of all of that. Now jumping back into  ComfyUI, long-time viewers will recognize this   workflow. It's very similar to my SDXL Lightning  from Blender to Generative AI workflow. Now while   we could use SDXL Lightning this time as well,  because we are using two different control nets,   one for Depth and one for Canny, depending on  your hardware you might get faster results with   SDXL Lightning or 1.5. In my case I am using a  1.5 model because I find that with my particular   setup it is a bit faster and also later we will be  using an IC light group and IC light uses 1.5 so   loading up two different models one SDXL lighting  and one 1.5 is going to put a bit more strain on   your hardware. Now the core of the workflow is the  screen share node. The screen share node allows us   to screen grab from any window. In this case it's  grabbing the screen from the viewport inside of   Blender, and the area that it's been grabbing  it's been set to the camera. So this is going   to be our source image. Then we are getting our  depth and our candy map extracted from the image,   and they're being applied through two  different control nets. And as a first step   we are generating an image at a lowish denoise at  0.45, explaining the kind of image that we want   to get. In this case, an advertising photo of a  well-dressed man with a tailored suit, sitting in   a luxury living room, mountains in the background.  And we get a decent enough result. The issues that   we had with the rough comping are being taken  care of, but the lighting is still kind of off,   so what we could do is enable this other group  that takes care of relighting. If we hit Queue   prompt again we can see that the lighting in the  intermediate stage is much better and the lighting   in the last stage is a lot better and all from  a very rough comping inside of Blender. Now here   there might be an issue. Let's say the client  wants the subject to remain the same because   let's say it's a fashion client and I want to sell  the clothes that the model is wearing. Well since   we are using two control nets the coordinates of  the subject both in the original picture and in   the generated picture are going to be the same.  So what we could do is enable this intermediate   group as well which is a segment anything group  that grabs the original subject from the original   image and place it on top of the generated image.  And if we go check out the relighting results,   we can see that the subject has remained the  same. Now, a great thing about the screen share   node is that it keeps on doing its thing while we  work inside of Blender. So what we could do is go   back inside of Blender, bring up a pop-up node  that has been appended at the relighting stage,   place it anywhere on the area and begin working on  our picture depending on the client's directions.   In this case, let's say the client wants the  subject to be closer to the camera. Now how   would you do that inside of Stable Diffusion  only? Well, good thing we are doing it with   Blender and we are resizing it for example and  bring it very very close placing it right here   for example in the center of the frame and we are  going to wait for the screen share node to do its   thing you just need to hit live run this way the  screen share node will continue working in the   background and here we get the result now it's not  exactly live but it's really quite fast. Or let's   say that the client wants to flip the room. We can  do that while in the background the workflow will   take care of that and will display it right here  inside of Blender. All automated, all really fast,   not really real time but not not real time either.  It all kind of depends on your hardware. Now all   of this was done with simple planes and the way  these planes work is by adding a shader that is   just a normal principal BSDF with an image texture  appended onto it. The color is going into the base   color and the alpha is going into the alpha.  This way it's taking our original image and the   transparency of it. But what if we wanted to have  more precise control with more complex scenes? For   example, what if we wanted to have a mesh? A mesh  would allow us to create a more complex scene and   to move inside and outside of that mesh as well.  Like in this case where I've recreated this sci-fi   scene starting from this starship over here and  adding characters to it. In this case, in order   to proceed, we want to fix the positive prompt,  of course. So we'll go for advertising photo of   a sci-fi movie, mecha robots on patrol on a steel  spaceship, desert planet, sunset, for example,   something really simple. Then we'll just have to  check that our area is set correctly and what we   want to do is just hit live run and as we can see  we start getting our results. Now in this case I   don't care too much about our subjects being the  same in the generated image as well so I'm just   going to bypass these merge original subject group  but let's say our art director doesn't like this   robot over here so let's move that out of the  way. And they also wanted to see the inside of   the ship a bit more. So what we want to do is just  turn around the scene a bit, reframe the camera,   move our subject robot a bit further inside.  Maybe we want to lower the denoise a bit, let's   say to 0.3. And as you can see, we can follow  the directions of our client over our director   of whomever very very easily and this is great  because otherwise if I had to try and randomly   get the shot that the client wants that the art  director wants I don't know how much time I would   need in order to do that so now we've seen how  to work with simple planes only with some planes   and a single mesh let's now check out how we can  work with the whole environment being a mesh and   our actors, our subjects being planes. Now in  this case I have found this environmental mesh   that would allow us to place our subjects however  we want. The only thing that is lacking is a sky.   So I'm going to bring that in as a plane and if we  hop into our camera view we can see that we have   all our actors placed inside of the scene and we  can see how this is a very rough composite but we   don't need to be precise anywhere else that is not  the camera viewer because stable diffusion is not   gonna see anything else but that and we just need  that to look good-ish. The generations and the   relighting are gonna take care of everything else.  Now in this case we've got a very messy scene,   a battle scene, so we want to go back to our  comfyUI, we want to be sure to change our   positive prompt field accordingly so advertising  photo of an action movie Vikings engaged in a   tragic battle in this case and since I changed  the aspect ratio of my viewer I want to set the   area again and by using the pop-up preview node  we can see how the scene is coming together. Now   once again we can change the camera angle, we can  change the position of our subjects. For example   we can switch the camera angle over to here, let's  flip the camera, let's move our actors. The client   might say, oh, well, I don't like this guy over  here. Let's move him to the back and let's bring   this one on the front. I like him very much.  And we can do that on the fly without having   to worry about random generations, without having  to worry about saying, oh, I'll have to do that   later because I can't do it now. And while we've  seen this working for people and environments it   can also work with products, it can also work with  anything really, it can also work with animations.   And we don't have to worry about our subjects  getting changed too much during the generations   because the positioning of the subject is going  to be the same in the original pictures as well   so we can just swap it out and merge it for  the original one. This is one of the most art   directionable and easiest ways that I've found to  provide clients with the ability to change things   on the fly. Let's do a bit of a deep dive on the  workflow. Long-time viewers of the channel will   recognize this workflow as a very similar workflow  to the one that I used for generating in real time   while using Blender. In that workflow, I used SDXL  Lightning, whereas in this case, I am using a 1.5   model. That's epicRealism. Which one to choose  is completely up to you. In my case, I'm using   1.5 because with my setup and while recording,  I've found that it's a bit faster, but your   mileage may vary. The core of this workflow is the  screen share node. The screen share node allows us   to screen grab from any windows. In our case we  are screen sharing from Blender. We just have to   select share a screen, pick our Blender window and  then set our area. In this case the area is the   camera view and by knowing the aspect ratio of our  camera view we can then resize our image. In our   case, we are resizing to 1920 by 1080, but if you  want to be faster, you can use lower resolutions.   Then we have two preprocessors. Since our images  are very depth intensive, I picked a ZOE Depth   Anything preprocessor, and that's because it  has an outdoor mode, and that's very great   for this kind of complex scenes. We also have a  canny edge preprocessor for the outlines. These   two pre-processed images get passed on to two  control nets. The depth one is at strength 1   because we really want depth to be just like that  and the canny control net is set at strength 0.1   because while I kind of care about outlines, I  don't care about it as much as I do with depth.   Then we have the usual our positive and negative  prompt fields and we have our case sampler. Now   depending on you using an SDXL lighting model or  a 1.5 model your setting is going to be different.   In my case for 1.5 I'm using 20 steps at CFG 6  with scheduler Keras. If you're using a lighting   model you are probably going to use 8 steps CFG  between 1 and 2 and SGM uniform as a scheduler.   Then we have our denoise value. Depending on how  much we want the generated image to be closer or   further away from our composition instead of  Blender, we have to set a denoise. In my case,   I want my resulting generated image to be as close  as possible, so I've set it to 0.3, but anything   really could work, it's all up to you. Regardless  of the denoising value, even if you put it 1,   the overall subject is going to stay in the same  spatial coordinates because we have two control   nets taking care of that. Then we have an optional  merge original subject group. This group is a very   simple one and uses a segment anything group  to isolate our subjects and place them on top   of the generated image. Now in this case we have  the mecha robots because we didn't use it for this   one but if we start a generation we'll see that  it works basically in the same way. Then we have   our relight group my white whale or at least  a trimmed down version of it. It's basically   a simplified version of my latest relighting  workflow the one that takes care of color   matching and detailed preservation. Instead of  having four different options for color matching,   we just have one, and it is the one that I like,  the fourth one. But you can pick any other one,   really. You just need to copy and paste from  that workflow. In this part of the workflow,   we have to use a 1.5 model, regardless of whether  or not you were using a lightning model before. So   we are loading up our checkpoint and our IC light  model. We are extracting a light map from our   generated image based on white values. That means  that the areas that are closer to white are going   to be the source of our light. Then our image gets  released and as you can see it's not great right   now. And that's because the colors are shifted  and the details are kind of different. So what   we want to do is go through all of this frequency  separation and color merging part, which takes   care of all of that. The original image and the  re-lit image get split into a high frequency and   a low frequency layer. The high frequency layer  from the original picture holds the details,   while the low frequency layer, which holds colors  and light infos, is merged and averaged between   the original and the re-lit picture. And on  the right here at the end we have our result.   We have a remap image range node which takes care  of remapping whites and blacks in case our image   turns out too dark. In this case I'm remapping  the white point because it was too dark. And   then we have a preview pop-up node appended to the  resulting image so that we can use that inside of   blender to have a live preview without switching  windows so all in all i think that this workflow   is very versatile in directing tiny little things  and even huge things really be it camera angles or   positioning of actors and subjects or tweaking  little insignificant little things like art   director so much love to do while standing over  your shoulder and say not too sure about that can   we change that and i thought about this workflow  because recently i had a couple of clients that   have asked kind of the same thing they wanted to  be able to control a lot of different things in   very difficult sets and this was the easiest way  that i came up with in order to be able to deliver   all of that attention to detail. So I hope this  is as useful to you as it is to me and I hope you   had some fun and learned something new. This is  gonna be it for today. If you liked this video,   please leave a like and subscribe. My name is  Andrea Baioni, you can find me on Instagram at   risunobushi or on the web at andreabaioni.com and  same as always, I will be seeing you next week.
Info
Channel: Andrea Baioni
Views: 4,372
Rating: undefined out of 5
Keywords: generative ai, stable diffusion, comfyui, civitai, text2image, txt2img, img2img, image2image, image generation, artificial intelligence, ai, generative artificial intelligence, sd, tutorial, risunobushi, risunobushi_ai, risunobushi ai, stable diffusion for professional creatives, comfy-ui, andrea baioni, sdxl, The 100 years old Tech that makes Stable Diffusion Art Directionable - Blender x SD, art direction, multiplane camera, multi plane, multiplane
Id: 6crgaxoKf8g
Channel Id: undefined
Length: 16min 6sec (966 seconds)
Published: Mon Jun 24 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.