Multi Plane Camera Technique for Stable Diffusion - Blender x SD

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Hello everyone and welcome to another episode of Stable Diffusion for Professional Creatives. Today we're going to take a lot of what we learned in previous videos and turn this very rough Blender compositing into good interesting generated pictures and more importantly art directionable. We are going to start with a very simple scene like this one and we'll move on to a more complex scene like this one and we're going to end with a very complex scene like this one in order to generate images such as these while maintaining the ability to work independently on the single elements that compose these images, such as the subject, the backgrounds, the objects inside, the camera angles, or anything really that an art director or a client may ask us to change. Anyone who's ever tried to generate on commission knows that clients and our director are used to tweak little things here and there. And that's something that Stable Diffusion is not really great at doing. We cannot tell Stable Diffusion to slightly change the camera angle, or we can't tell Stable Diffusion to turn the subject slightly towards the left. And since, well, I'm used to doing all those things in a studio or on a set, I thought, well, what's the closest thing to a set that we can use in order to prepare our images for stable diffusion? And well, what are 3D programs if not a virtual set of sorts? Now, same as always, you will be able to find this workflow in the description below, along with any nodes or models you might need. At the end of the video, I will tell you about how this workflow works and what's it all about. But for now, you just need to know that it's basically two different workflows tied together. So in order to work in a 3D environment, while having the option of work on each little thing independently, I realized that I could use a technique that is nothing new really. It's been used in animations like Dsny films since decades ago. What I'm basically doing here is just splitting the subject, an object, the mid ground and the background into different planes, which is basically the same as something like this. And in order to have the files ready, I cut out everything that I didn't need from these pictures. So in this case, our main subject is a cutout of the subject with everything else being transparent, our chair here received the same treatment, the windows in this lounge have been rendered transparent, and the background doesn't need transparency, well, because that's the background. And if we hop over to our camera view, we can see that everything works together. Now, you might say, well, you could do that in Photoshop as well, there's no need to do this in Blender. And well, yes, that is definitely true. But that is only true in this case, because in this case, we're only using planes. We'll see later how we can use actual meshes in order to work on more complex environments. But in this particular case, what we can do is just select, for example, the planes that we want, in this case, the chair and the subject, and resize them at will, place them wherever we want to. We can resize the background. We can move it around., place them wherever we want to. We can resize the background, we can move it around, we can do anything we want. More importantly, we can do anything that a client or an art director might want without resorting to masking, without resorting to in painting, without resorting to all that kind of stuff. And the great thing about this workflow, this approach, is that we don't even need to care too much about comping correctly, about having the right lights, about having the right kind of depth, because regenerating this composition will take care of all of that. Now jumping back into ComfyUI, long-time viewers will recognize this workflow. It's very similar to my SDXL Lightning from Blender to Generative AI workflow. Now while we could use SDXL Lightning this time as well, because we are using two different control nets, one for Depth and one for Canny, depending on your hardware you might get faster results with SDXL Lightning or 1.5. In my case I am using a 1.5 model because I find that with my particular setup it is a bit faster and also later we will be using an IC light group and IC light uses 1.5 so loading up two different models one SDXL lighting and one 1.5 is going to put a bit more strain on your hardware. Now the core of the workflow is the screen share node. The screen share node allows us to screen grab from any window. In this case it's grabbing the screen from the viewport inside of Blender, and the area that it's been grabbing it's been set to the camera. So this is going to be our source image. Then we are getting our depth and our candy map extracted from the image, and they're being applied through two different control nets. And as a first step we are generating an image at a lowish denoise at 0.45, explaining the kind of image that we want to get. In this case, an advertising photo of a well-dressed man with a tailored suit, sitting in a luxury living room, mountains in the background. And we get a decent enough result. The issues that we had with the rough comping are being taken care of, but the lighting is still kind of off, so what we could do is enable this other group that takes care of relighting. If we hit Queue prompt again we can see that the lighting in the intermediate stage is much better and the lighting in the last stage is a lot better and all from a very rough comping inside of Blender. Now here there might be an issue. Let's say the client wants the subject to remain the same because let's say it's a fashion client and I want to sell the clothes that the model is wearing. Well since we are using two control nets the coordinates of the subject both in the original picture and in the generated picture are going to be the same. So what we could do is enable this intermediate group as well which is a segment anything group that grabs the original subject from the original image and place it on top of the generated image. And if we go check out the relighting results, we can see that the subject has remained the same. Now, a great thing about the screen share node is that it keeps on doing its thing while we work inside of Blender. So what we could do is go back inside of Blender, bring up a pop-up node that has been appended at the relighting stage, place it anywhere on the area and begin working on our picture depending on the client's directions. In this case, let's say the client wants the subject to be closer to the camera. Now how would you do that inside of Stable Diffusion only? Well, good thing we are doing it with Blender and we are resizing it for example and bring it very very close placing it right here for example in the center of the frame and we are going to wait for the screen share node to do its thing you just need to hit live run this way the screen share node will continue working in the background and here we get the result now it's not exactly live but it's really quite fast. Or let's say that the client wants to flip the room. We can do that while in the background the workflow will take care of that and will display it right here inside of Blender. All automated, all really fast, not really real time but not not real time either. It all kind of depends on your hardware. Now all of this was done with simple planes and the way these planes work is by adding a shader that is just a normal principal BSDF with an image texture appended onto it. The color is going into the base color and the alpha is going into the alpha. This way it's taking our original image and the transparency of it. But what if we wanted to have more precise control with more complex scenes? For example, what if we wanted to have a mesh? A mesh would allow us to create a more complex scene and to move inside and outside of that mesh as well. Like in this case where I've recreated this sci-fi scene starting from this starship over here and adding characters to it. In this case, in order to proceed, we want to fix the positive prompt, of course. So we'll go for advertising photo of a sci-fi movie, mecha robots on patrol on a steel spaceship, desert planet, sunset, for example, something really simple. Then we'll just have to check that our area is set correctly and what we want to do is just hit live run and as we can see we start getting our results. Now in this case I don't care too much about our subjects being the same in the generated image as well so I'm just going to bypass these merge original subject group but let's say our art director doesn't like this robot over here so let's move that out of the way. And they also wanted to see the inside of the ship a bit more. So what we want to do is just turn around the scene a bit, reframe the camera, move our subject robot a bit further inside. Maybe we want to lower the denoise a bit, let's say to 0.3. And as you can see, we can follow the directions of our client over our director of whomever very very easily and this is great because otherwise if I had to try and randomly get the shot that the client wants that the art director wants I don't know how much time I would need in order to do that so now we've seen how to work with simple planes only with some planes and a single mesh let's now check out how we can work with the whole environment being a mesh and our actors, our subjects being planes. Now in this case I have found this environmental mesh that would allow us to place our subjects however we want. The only thing that is lacking is a sky. So I'm going to bring that in as a plane and if we hop into our camera view we can see that we have all our actors placed inside of the scene and we can see how this is a very rough composite but we don't need to be precise anywhere else that is not the camera viewer because stable diffusion is not gonna see anything else but that and we just need that to look good-ish. The generations and the relighting are gonna take care of everything else. Now in this case we've got a very messy scene, a battle scene, so we want to go back to our comfyUI, we want to be sure to change our positive prompt field accordingly so advertising photo of an action movie Vikings engaged in a tragic battle in this case and since I changed the aspect ratio of my viewer I want to set the area again and by using the pop-up preview node we can see how the scene is coming together. Now once again we can change the camera angle, we can change the position of our subjects. For example we can switch the camera angle over to here, let's flip the camera, let's move our actors. The client might say, oh, well, I don't like this guy over here. Let's move him to the back and let's bring this one on the front. I like him very much. And we can do that on the fly without having to worry about random generations, without having to worry about saying, oh, I'll have to do that later because I can't do it now. And while we've seen this working for people and environments it can also work with products, it can also work with anything really, it can also work with animations. And we don't have to worry about our subjects getting changed too much during the generations because the positioning of the subject is going to be the same in the original pictures as well so we can just swap it out and merge it for the original one. This is one of the most art directionable and easiest ways that I've found to provide clients with the ability to change things on the fly. Let's do a bit of a deep dive on the workflow. Long-time viewers of the channel will recognize this workflow as a very similar workflow to the one that I used for generating in real time while using Blender. In that workflow, I used SDXL Lightning, whereas in this case, I am using a 1.5 model. That's epicRealism. Which one to choose is completely up to you. In my case, I'm using 1.5 because with my setup and while recording, I've found that it's a bit faster, but your mileage may vary. The core of this workflow is the screen share node. The screen share node allows us to screen grab from any windows. In our case we are screen sharing from Blender. We just have to select share a screen, pick our Blender window and then set our area. In this case the area is the camera view and by knowing the aspect ratio of our camera view we can then resize our image. In our case, we are resizing to 1920 by 1080, but if you want to be faster, you can use lower resolutions. Then we have two preprocessors. Since our images are very depth intensive, I picked a ZOE Depth Anything preprocessor, and that's because it has an outdoor mode, and that's very great for this kind of complex scenes. We also have a canny edge preprocessor for the outlines. These two pre-processed images get passed on to two control nets. The depth one is at strength 1 because we really want depth to be just like that and the canny control net is set at strength 0.1 because while I kind of care about outlines, I don't care about it as much as I do with depth. Then we have the usual our positive and negative prompt fields and we have our case sampler. Now depending on you using an SDXL lighting model or a 1.5 model your setting is going to be different. In my case for 1.5 I'm using 20 steps at CFG 6 with scheduler Keras. If you're using a lighting model you are probably going to use 8 steps CFG between 1 and 2 and SGM uniform as a scheduler. Then we have our denoise value. Depending on how much we want the generated image to be closer or further away from our composition instead of Blender, we have to set a denoise. In my case, I want my resulting generated image to be as close as possible, so I've set it to 0.3, but anything really could work, it's all up to you. Regardless of the denoising value, even if you put it 1, the overall subject is going to stay in the same spatial coordinates because we have two control nets taking care of that. Then we have an optional merge original subject group. This group is a very simple one and uses a segment anything group to isolate our subjects and place them on top of the generated image. Now in this case we have the mecha robots because we didn't use it for this one but if we start a generation we'll see that it works basically in the same way. Then we have our relight group my white whale or at least a trimmed down version of it. It's basically a simplified version of my latest relighting workflow the one that takes care of color matching and detailed preservation. Instead of having four different options for color matching, we just have one, and it is the one that I like, the fourth one. But you can pick any other one, really. You just need to copy and paste from that workflow. In this part of the workflow, we have to use a 1.5 model, regardless of whether or not you were using a lightning model before. So we are loading up our checkpoint and our IC light model. We are extracting a light map from our generated image based on white values. That means that the areas that are closer to white are going to be the source of our light. Then our image gets released and as you can see it's not great right now. And that's because the colors are shifted and the details are kind of different. So what we want to do is go through all of this frequency separation and color merging part, which takes care of all of that. The original image and the re-lit image get split into a high frequency and a low frequency layer. The high frequency layer from the original picture holds the details, while the low frequency layer, which holds colors and light infos, is merged and averaged between the original and the re-lit picture. And on the right here at the end we have our result. We have a remap image range node which takes care of remapping whites and blacks in case our image turns out too dark. In this case I'm remapping the white point because it was too dark. And then we have a preview pop-up node appended to the resulting image so that we can use that inside of blender to have a live preview without switching windows so all in all i think that this workflow is very versatile in directing tiny little things and even huge things really be it camera angles or positioning of actors and subjects or tweaking little insignificant little things like art director so much love to do while standing over your shoulder and say not too sure about that can we change that and i thought about this workflow because recently i had a couple of clients that have asked kind of the same thing they wanted to be able to control a lot of different things in very difficult sets and this was the easiest way that i came up with in order to be able to deliver all of that attention to detail. So I hope this is as useful to you as it is to me and I hope you had some fun and learned something new. This is gonna be it for today. If you liked this video, please leave a like and subscribe. My name is Andrea Baioni, you can find me on Instagram at risunobushi or on the web at andreabaioni.com and same as always, I will be seeing you next week.

Info

Channel: Andrea Baioni

Views: 4,372

Rating: undefined out of 5

Keywords: generative ai, stable diffusion, comfyui, civitai, text2image, txt2img, img2img, image2image, image generation, artificial intelligence, ai, generative artificial intelligence, sd, tutorial, risunobushi, risunobushi_ai, risunobushi ai, stable diffusion for professional creatives, comfy-ui, andrea baioni, sdxl, The 100 years old Tech that makes Stable Diffusion Art Directionable - Blender x SD, art direction, multiplane camera, multi plane, multiplane

Id: 6crgaxoKf8g

Channel Id: undefined

Length: 16min 6sec (966 seconds)

Published: Mon Jun 24 2024