Using AI to generate consistent character bust for Metahuman or Character Creator

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

It's been a minute now. Alright guys, so I know it's been a while since I posted a video, "Life", that's how it goes. But I'm gonna show you today, basically overwriting the video, the first video I had, or maybe extension of the first video I had, is to show you how to make three distinct type of faces using only a couple of key words and maybe changes, understanding changes that you can make to the scheduler, the sampler name, and the cfg value that help you get three different images that you see on screen. So let's jump into it. I'm trying to make this video quick and yeah, let's go. Alright guys, so what you're seeing on screen is a new interface. This is different from the first video I produced where I used an interface called Invoke AI and that's very similar to the popular alternatives which is Automattica 1111, S-Team Next, where you're dealing with a front end where you you know, you have a bunch of sliders, you enter in text, etc, etc. This is more node-based and it provides a lot of flexibility and it's now the one I currently use as my daily driver and even though nodes are scary at first, quote-unquote, once you get into it you see the power, the flexibility you get by using a node-based system. This is an open source alternative but the guy who worked on it was actually hired by Stability AI to actually maintain it. So they get a lot of the features early, they use it inside for their internal testings of models. So we're going to get started, we'll create this from scratch. This will not be a tutorial on installing Comfy or Comfy in the Manager. I will leave videos in the description on people who've explained it best. It's really simple, it's no different than Invoke AI but it's better that we leave that out of this video so we can make it short, right? So let's mute this. I'll just push this off to the side. I'm gonna mute this whole... not that... just mute everything here. First thing we need to start off with is a model. In today's model we'll be using an XDL model which is called Night Vision XL. Again, any models that have the XL is going to be for the XL variant. Any of them that don't, you'll have to see base tier on the base model what they are if they're from 4.1.5 or if they're 4.2.1. Okay, so let's go up and first thing we need is a checkpoint. So I'm gonna right click. You always want to get the safe tensor version of that model. So there we go. So we have the model, we're gonna pull this out and what we need is a sampler. Basically it's the brains of the operation. It's gonna help us with our scheduler name and how do we control the noise that goes into it. Alright, so the next thing is we need prompts. We need special XDL ones so I'm not gonna go searching through them. Actually let's just pull them from our top here. So we got one and what we're gonna do in Comfy, this comes standard. You can double click and you, depending on the custom nodes you have installed or just the regular nodes you have installed, you can actually type what you want. So this is the XDL clip encoder, our encode. This one here, the refiner we're not going to be using today and you can learn all about that. So this one has everything built in for us. It has positive here, it has a detailed G which I won't get into. That's more of a detail kind of this is G, this is L and then it has a negative right. And so we have, let's get rid of this so we don't spoil the fun. And now that we have that, we need two encoders to basically take this text, turn it into something that the K sampler it understands, right? So we can pull this out real quick from here. And actually we need SDL, SDXL versions of this. So again, if you have this happening, this is something where wonky with comfy UI's interface, just you gotta click on the interface zoom out, double click, do it again. So and we need, sorry, we need the clip handers. Now explain that around. So this is going to encode your text into a condition to be passed to the K sampler. This is a prompt and this is a custom node I have. It's just easier to have this and I'll explain why if I do X, the XL and this is, there's a bunch of them I have here because I have a lot of custom nodes installed, but this is X, the XL prompt style or you can find that in one of the custom packages through the manager here. So it's very easy to install. Make sure you install the manager. What we want to do is change some things here. We want to change this to 49, the six 40 96 because that's what the model itself is trained on. That's the data, the size of the noise. And then we want to tell it the target output, which is 10 24 by 10 24. Now I'm going to just pull that one down from here. So we need to, this is a positive and a negative. We're going to take that and we're just open these up, mute these. And then what I'm going to do is take my positive G here, there, then my negative here and there. And we'll plug that into the positive here and the negative again, 40 96 here. Now you'll notice that before I was able to enter in the values of the, let's bring that back up and you'll see that I'm able to enter in here the positive clip L and clip G, which is what I just plugged in here. If on any node, if you want to turn it into an input, you can right click on the top title of the node and you can say convert to whatever that property into as an input. And if you ever want to convert back, you just go though, it'll show in this one and say convert back to widget. Right. So now that we got that, we can actually close these up because we won't need to touch them again and then we'll move on. So now we need a latent image and we'll pull that out and we need an empty latent image. So this is what generates the noise into the case sampler. If you don't know, stable diffusion starts off from noise and it gradually removes that creature image. We'll pull down our one that we've already used and we're going to pull down another node, these two here. So I'm just going to copy those and pull them down. So the first one is the latent image I saw and I've done the same thing I showed you before where I've turned these into inputs and plug them into a preset and we're plugging that straight in here just to make it easier for us so we don't have to remember it. All right. So after that, we can go and we want to decode this latent and basically we're turning the noise into something visual. So we're going to go into the decoder here and we need to plug in this here. I always forget to plug in the clip from the model there that needs to go into there. It's going to error out and then we need a preview image. Let's just verify that it's everything's working, everything is up and running. So we're just going to say "duck" or "dogue". So it says we're missing the target width and height, right? So you'll get this if you're missing any required plugins and I am and that's the target width and height that goes into here. And again, the target width and height is our resolution we want to output. So let's do that. That's in here. All right. So while there's a generator, we'll go through real quick. The noise seed is you'll see seeds, different seeds will give you different results. This is random. We have our steps, we have our CFG, we have our sampler name, scheduler, we have the start and steps and the return noise left over. So I'll explain some of these really quick and where you need to look to get the best results out of the model you may be using. So if you download the model from Civic AI like we did, night vision, photorealistic, you're going to notice a couple of things and I should tell you there's a couple places you should look to get this information. So first of all, we did get an image, looks good with this settings, right? So the noise seed doesn't matter. This is going to control some of your customizability to what you're doing. Random controls this number. The steps is how many steps the noise takes to resolve or dissolve, if you want to say how many iterations it's going to go over before I do it. It does it. CFG is how creative it can get, right? I should back up. I should say that you need to check the model to make sure what numbers work best, the CFG for that model. Same thing with the sampler and the scheduler. The start at step and end at step and return was left over noise is mostly advanced topics. It's not a topic for today. We're going to talk about regenerating heads like we did before in the first video from a control image, but that's there is something for you to look into. All right. So what we need to do is look at this model and see based on the images the author provided for the model with some of the best settings we can use. So one, we want to let's just look at this. This is a we're going for realism. It looks like a realistic photo, obviously. And we can see down here the samplers and that DPM plus family. So we can be confident that probably in any of the DPM family we use, we're probably going to get optimal results. Not to say that we can't experiment like you saw use other samplers, but we may want to go ahead and stick with this first before we start experimenting out. We're in notice here that it has a very low CFG. So instead of that seven, eight, it has three, fours I've seen. So we kind of get a baseline of where we should start. I can go to another image. We'll see the same thing. Low CFG. Same thing. So we're going to stick with that and it looks like it resolves pretty good at 30 steps. So we've got pretty good information on what we need to get started. So let's do that. What we're going to do is we'll leave it on random. We're going to drop this down to 3.5 and let's use, let's stick with this sampler that's in that and we'll go ahead and use this scheduler. And if you didn't see where the scheduler was, it was on the end here. That's the scheduler down there. So I think that's a better image of the dog, but now that we got this set up and verified that it's working, let's move on to the control image. So one, before we even hook that up, let's grab our prompt and I'm just going to use the original prompt I had before, but I'm going to show you how we can change it and get rid of most of this and be careful with stable diffusion models, especially the new ones, xdxl, because you're getting something, you may get something that you don't want to get. And as we go down here, I'll explain these styles and how we can use them, but let's go ahead and generate. And out the gate, we get a pretty good image. This would totally be usable right now. You can just save this, take it out into character creators headshot and plug it in. But we have a lot of keywords here and the more keywords you'll see this is like prompt stuffing. The more keywords you have, the more control you lose. All right. That's pretty, pretty good. In the new xdxl models, there is what they call style prompts. Then this was in the original as well, but theory, they're more heavily used, I would believe now for the newer models only because it reflects the popularity of places like mid journey where mid journey basically takes. And this is a difference mid journey takes and it looks at your keywords and it uses inference to say, Hey, add some more words into the prompt that you don't see in the background. So we can kind of use the same thing, the same logic where this has been built in. And I can say, I want a photo glamor. Now what this is going to do on the back end, even though you don't see it, is it's going to add, actually let's log the prompts. So I can show you, it's going to add a couple of different keywords here to change the image. So this is the image here. We're going to fix this from now on and let's hit the prompt and watch how it changes. So now you get more of this glamor photo shop, kind of look magazine photo just by adding that. And if we look at the prompt, this is it here. We'll see in here that it added these words, high fashion, extravagant, stylish, sensual, opulent, you know, so it added all of this in the background to produce this image, right? So we can kind of use that for our own generation purposes, not all of those because we lose control. We can't see what's going on to create our images like I showed you at the beginning. So let's jump into it. So first we need a control net, right? Let's get rid of this prompt. We're going to keep that one there for the negative. By the way, I didn't explain it. Positive, negative, I'm sorry, positive, detail, negative. Just think of it in that way. Don't worry about these so much. And so I'll grab this from, we're just going to grab all of this and then I'll explain what's going on here. So let's move this up here. unmute it. So we got a couple of things going on. First off, we're going to load our image. Let's hook this up. Let's step through it first. So you're going to load the image. This is a preview image. Then what we need to do is load the control net model. If you look at my other video, again, that was a drop down, you drag and drop. So this is a little more involved. So the first thing we need to do is tell it to apply the control net. Again, you can search for these, you can say control net. We're using the advanced model. And then as you have that you can pull off, we need to tell it to get a control net loader. And this is the model is going to use to load and we're using the depth map, the zoey version of it, there's different ones. So you'll have to those are all downloaded from Huggy Face, you can search those out. We're using the zoey model, we need to get the preprocessor. So again, you can type that in, you can type that up. And you'll have these preprocessors, you can see there's different ones. But today we're using the zoey, right? And then after that, we need to plug in our positive and negative, they're going into the case samplers. Basically, we're going to intercept this encoding, modify it again, then give it to the case sampler. So let's just drop this in. Today, we're only going to be doing one control net. And I'll show you in another video, how we can use, we can kind of stack these and then use it to not only generate the face of our model for ideation, but also poses so that we could, you know, either sculpt on it model on it, things like that. So now that that's into the control net, we're going to go plug this into the positive. So now we're going to have our style go through, get encoded, get changed, pass back into the case sampler, and then output it. So we're going to put that back in just the way it is, we got rid of glamour. So let's look up here, what I've done is I'm this is the same thing that works down here, started step ended step is that I'm telling it to you started zero when it started to generate its imagery and end at 25% basically. So only use this head to dictate how the image is going to look for 25% of that generation. Alright, so now that we got this, we're rocking and rolling, we want to obviously get more control that more control I talked about in and maybe generate the three different looks that I was looking at. Now, if you do any more background search on this, you're going to run across some articles on what's called XY plotting, you can use that I'm not doing that because I'm screen recording, it takes up a lot of memory because it's doing a lot of things all at the same time. And it will give you basically a chart breakdown and graph of different settings that you're using. And it's very fast way to check how different settings are affecting your image. We're not going to do that now, but you can look that up. It's XY plots. Now, what we're going to do is basically let's let's get some control over our image and how everything is looking and make it customizable to what we're doing. So first, the things that are going to contribute to how your images look the most is the prompting, obviously, but some big factors are the sampler name, the scheduler, which you can usually leave these the way they are. The CFG again, based on the model, the steps, the noise seed. That is it. These will probably stay fixed most of the time. If you're not liking the head, you're getting, you can change the seed and that will change it up. So we look. And so you'll get a different generation slightly slight variation, but you can see this can be used as fine tuning for your model or for your your head. All right. Character keep calling it head. Let's say character. So let's look at how we can change this up with just a few keywords. So first of all, we're going to get rid of a lot of these words here that we don't need anymore because the control image is being used. So first of all, we're going to keep details in. We don't need to use looking at camera because the head takes care of that or the control image takes care of that. We don't need to be saying we're going to keep facing forward only because this sometimes affects the shoulder and we could go back and forth on that, like removing it or adding it to see if it affects it. It depends on the model, which let me backtrack just a little bit. This is another big contributing factor is the model you use and how it's trained. So switch different switching different models will give you different results, more realistic or less realistic, depending on the model and how it's trained. All right. Back here. Let's keep face forward. We're going to get rid of the symmetrical. We'll get rid of soft lighting. We'll keep the white background. We don't need center frame anymore and everything. And let's just regenerate from here. All right. So we got a white background. Her hair is behind her looking good. Right. So let's take it a little further. Now, before I showed you that you can change the photo based on the prompt styling. So let's take advantage of that. Right. So instead of saying just a photo of her and this is pretty good again, can use that right now. Let's try to change the light. As you'll learn this, you know, we have some shadows here, but we kind of want to maybe get that magazine quality look I was talking about before. Right. So just by adding a simple word, we'll just add glamorous to the, let's see how this actually changes the prop. You see that obviously it's smoothed out her skin a lot more. Again, it touched her up a little bit. Little foeshop touch up, right? Not much. Inside the negative prompt, we're going to add makeup, right? So we don't want it to include any makeup. So it removed any, what is that on the, it's not concealer, but is it blush on the cheeks? So now you can see the, you know, her skin details are coming through. And for her character, that's probably what you want to keep. So that's great, right? And so let's get rid of facing forward. Let's just kind of dismantle this prop all the way down. Head straight, we don't need, we want to keep the t-shirt and the white background, but we're going to move this out of the top prop so that we have a little more control. So one is meant for kind of natural language and the other is meant for tokenized type of details. So what we're going to do is take t-shirt white background since they're just keywords, place them down here. So we're also going to take her ethnicity, place it down here. And we're going to bracket this, which just provides it weight, I believe 1.1. And we'll take detail skin, same thing, and place it down at the bottom. All right, so now we just have glamorous photo of a woman tied back here. Let's see what we get. Now, because this is a glamour shot, it's done it in black and white, we got what we want. You can see the t-shirt is missing. So what we're going to do is add a couple of things. So first of all, we'll add color to make sure that the color comes out. And this is, by the way, how you should be prompted, right? Prompt something, generate, you don't like it, then add in what you don't like, either add or remove negative or tokenized prompts or add to the sentence. So we're going to do that. So we're going to say that tied back here, and there we'll say wearing a, let's just say, yeah, let's say t-shirt. Let's move it from here. And you can move this back up in here, but we'll keep it down here. And what we also want to do is order, it's like order of operations, right? The earlier it comes in the prompt, the more it matters. Detailed skin, and let's generate that again. Now we got the color back. She's wearing a shirt. And that looks good, right? So great. But let's keep going. What we want to do is that I showed you before in that negative prompt just to bring that up again. That there was a lot of words that were actually added. Let's go up here because we got here we go. No. There we go. So there was a lot of words added with that style prompt was on glamorous, right? So it also added negative text and you'll see here ugly, deformed, noisy, blurry, distorted. You get the picture. So it added all of these in. So how would that affect this here? So let's do that. Say we want that glamour shot. You know, that photo, super Photoshop painting. We'll go in here and we'll just paste all of that in there. So ugly, deformed, noisy, blurry, distorted. You get the point and let's see how it changes the image. You can already see in the preview. And there you go. So you can see that with just a couple of keywords, her pose hasn't changed obviously because we're using the control image. But we've gone from this, you know, somebody who may have stepped out, didn't have any makeup, to, you know, a very normal glamour shot with the change of the keyword by adding just makeup. Remember, this is negative. So we're saying we want a glamour shot, but we don't want them to mess with that overly kind of painted outlook. You can see she loses the texture in her skin in something you may want, something you may not want. But let's go back to this generation real quick. And you see she's got the texture in her skin, the photos, things like that that are not that Photoshop painted out, that work well and help headshots kind of define your character. You can always do that later. So it's barely you probably don't want to use the glamorous negative prompts like I did, but you can do it. It's an option. All right. So moving on, we have generated this image. And again, we can show the custom it can be. So now that I pulled that out, we're not going to put black because that can be confused as a character or as the actual color. I can't type, can I? So we're going to put the actual ethnicity so that the model understands. There you go. I've kind of switched it up. And again, lots of little changes you can make. You can change the control image for the head shape. You can change the scheduler. Let's change the scheduler to. Yeah, let's use that one. This will change up the image as well. Probably slightly. Let's see how much it changes. It didn't change it too much because that's in the same family. Let's go into Euler on Sestre. So it had like a drastic change and it kind of smoothed that skin out. Again, you kind of see the control we have with just very slight tweaks. But what about I know you're saying what about the men? We need a man in there, right? So I'm going to use the same head and let's get rid of this and let's add some detail, so we'll go. We'll say. Man. With tied back hair, we'll just say, we'll just say a photo of the man. We're going to add detail somewhere else. Let's go from here. See what we get. The makeup will be ignored because he obviously won't have any makeup. All right, cool. Say we don't want the beard. Obviously, instead of makeup, we can put beard. Guy looks a lot younger and there we go. So you can see how the flexibility of using this could lead to changing a couple of keywords, moving on to different characters, right? All right, guys, I'm going to wrap up this video. It's getting a little long in the tooth, but in the next video, I'm going to go ahead and show you how we can do what's known as a high-res fixing comfy UI. This will allow you to upscale the image somewhat without running out of memory. And then we'll take that out of comfy UI and put it into topaz AI or whatever you have to upscale your images. The way we can make it the proper 4K XK. Imagery for creators CC headshot plug-in. So you guys know how YouTube works. Give me a thumbs up, subscribe, and I will do my best to keep producing. But as always life. See you guys in the next video.

Info

Channel: _its.just.regi_

Views: 1,174

Rating: undefined out of 5

Keywords: filmmaker, bmpcc4k, bmpcc6k, orlando, cinematography

Id: okAHWBmcVWc

Channel Id: undefined

Length: 29min 17sec (1757 seconds)

Published: Fri Sep 29 2023