Image stability and repeatability (ComfyUI + IPAdapter)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone this is mato and today I'm going to talk about stability and repeatability in a previous video I showed you how to create small variations over the same subject today we'll create a character and put him in various scenarios trying to keep the same face of course but also the same clothing and gadgets the first thing that I need to do is the face I'm using dream shaper 8 as main checkpoint it is an s D15 model that I'm using simply because it is fast but all that I'm going to show you today also works with sdxl and actually probably better so here I have a very simple prompt fantasy illustration male birded 45 years old half elf Ranger Etc this gives me this result now since I want this workflow to be modular I'm going to split the prompt so if I want to change certain aspect of my image I can do that more easily let me show you I'm copying the positive prompt and in the first one I'm describing the main character so fantasy illustration I'm removing closeup and I'm going to keep the style tokens in the second prompt I'm only using words that are going to help me with the generation of this first image because we will have a lot more so this is basically a closeup portrait facing camera I am merging them with conditioning concut and then to the case sampler the generation will change of course so what we need to do in this first step is to generate a phas that is straight and looking at the camera so that will be our reference for the IP adapter phase model later now this character is a bit vanilla to improve things a little I can try to add the name name of a celebrity let's try with jezel Mama and I need to lower the strength quite a bit 75 should be fine now it is better and it is also going to help with the stability during the whole process the image is a little burnt but I don't want to lower the CFG so I want to try with CFG res scale the multiplier can be even a little lower 65 we connect the the model and then to the casler this is good already but I want to see if I can get a better stance I want him to be in a very neutral position and expression and in this picture I cannot see one ear so let's see if I can get anything better this is very good but I want him to be really straight facing at the camera so to do that we use a control net I've already prepared the post that I like let let me load the image now we need the control net node I lower the strength a little it's always good to give some freedom to the model connect the image we need to load the control net model I'm using an open POS then I connect the conditioning and back to the case sampler let's see what happens I could zoom in a little more but the position is perfect let's try another few seeds Okay this is very clean and and I think it should be fine remember that we only care about the face at this stage I don't care about the armor or whatever so now that we have our reference image I'm going to upscale it with image upscale with model we load a model connect the image this is upscaled four times which is a little bit too much so I'm going to scale it down to two times5 I'm going to use Lanos and I want to try to increase the sharpness with a cast sharpening node now I'm ready for the second pass we convert this to the latent space we copy with shift control V I get the node with all the pipe connections ready connect the latent change the seed 20 steps should be enough now I want the image just to be sharp I don't care if the character is very different from the original let's see what we've got we need to lower the den noise of course5 which is pretty high let's see okay this is pretty good now I need to cut out the face and for that I can use a crop image node if you have the comi essentials extension you get the image crop plus which is a little easier to use but you can also use the default node I'm going to show you the difference the plus one one let you select the starting point for the crop I can select top Center I need a bigger window and then I can set an offset now it is centered also the node outputs X and Y positions and it can be helpful if you want to make changes and then pass the changes back into the original image but we don't care about that now so this is the phase of our character and everything else will be just ignored next we need to give a body to this face actually before doing that let me see if I can make a new picture with this face so I'm grabbing the first case sampler and of course I need an IP adapter Noe for the model I'm using IP adapter plus pH then I need the clip Vision the reference image is our face then we need the model pipeline V the code now I'm creating a new positive text prompt and I'm removing from The Prompt any physionic description of our character so now the model knows how the phas of our character is done only by looking at our reference image we have nothing in the text I'm lowering the weight a little to give some freedom to the model and I don't want you to have false expectations the result won't be one 100% our reference image the IP adapter is not a face swap so let's see how it goes and as you can see the model is now able to generate multiple images all with the same face now let's see what happens if I change the prompt a little instead of a simple leather armor I may want a full plate armor I'm giving this more weight let me also add closeup and and now it's the same character but with a full plate now since we used a phase model anything that doesn't regard the face is very easy for the model to do if I want to change anything that regards the face like the expression we need to play a little with time stepping so let me try with laughing this is probably not going to work what I can do is to stop the influence of the IP adapter at like 60 % and now he's laughing or angry oh wow he's very mad let's try with an open mouth so you'll have to play a little with the weight and the time stepping options but you can get really interesting results and if the text prompt is not reacting well you can try with weight type linear that gives a little more importance to your prompt so now the body so I need a new K sampler another positive prompt I'm going to need a conditioning concut I'm merging our generic description of the character with the new text prompt and to the cas sampler here I'm just using standing now to make things simpler I'm adding a control net I've prepared an open pose stance this one very simple okay it works but I need a bigger l pent let's do 7 68 and try again you probably notice that I didn't use the reference pH but at the moment I'm only worried about the outfit so I can keep generating until I find the one that I like this is a pretty good one I might also need a negative prompt so that I can exclude details that I don't want like the sword when you you find the right picture you may want to do small variations over that concept and to do that we can use the case sampler Advanced we move everything to the new sampler change all the values uh I don't need this anymore now I'm converting the end at step to an input double click set it to like three and duplicate this node to create the variations we connect them together we need to enable return with leftover noise to the first disable add noise to the second we need to set ended step back to a widget and set it to whatever big number and convert start at step to an input connect it to the previous primitive and so now the two Cas Samplers are synced I'm changing the sampler to DPM Plus+ to MSD this is because SD is less deterministic more random so I get different results each time we should be set let's see so now the first three steps are done by the first case sampler and the others by the second if I change the seed I get a new result now let's build our character with all the features that we created so far before going to the next step maybe I want to do a second pass over this image to get some crispiness back so I can create a new Cas sampler I'm setting the the noise to like 25 let's do 35 this is just to remove a few errors from the original generation not strictly needed actually so this would be enough and I could fit this picture to an IP adapter but we would lose a lot of details because the clip Vision encoder works with very small images so what I'm going to do is to split my reference into two parts uh one for the leg and one for the Torso and I can use again a crop image node so now I have two square images that are ideal for the clip Vision encoder and of course over here I have the face now I need a new case sampler and empty latent and let's try to compose our new character with three IP adapters the first is for the face the second is for the Torso and the last one for the legs we already have the clip Vision then for the face we use the IP adapter model that we used earlier and we need another model for the body I'm I'm using the plus model now the model pipeline goes in the first one and then I'm daisy chaining thep adapters I'm increasing the noise to all of them I want the weight of the phas to be pretty high let's start with 85 while the body will be a little lower 7 I'm also going to need a new positive prompt that I'm merging with the original text prompt with a conditioning concat like before I'm leaving standing for now standing let me fix the composition because I wrote stranding instead of standing now it is actually working but I don't have enough room to show the whole character as you can see I always get the same outfit it's not 100% our reference but good enough let me grab the control net now to help the AP adapter I'm going to tell each of the three nodes what part of the picture is going to influence so I'm copying this image just as reference adding a load image past my reference copy this two more times so the face should only be applied on top about here and I'm sending the mask to the first IP adapter the body and the legs let's try again and now we can generate multiple images that are generally very close at least the main characteristics are always there uh the colors and of course the face is always the same and the overall clothing is very very close of course this is a complicated character to do because it has a lot of gadgets and details it works very well with modern clothing thing and of course we can play with the weight of the three I adapter and with the text prompt and of course with different poses let's try a pose that is completely different from the original I have this pose for a person playing the lute of course without a line art or cany or whatever control net it won't be able to actually place a loot in his hands so I'm not even going to try but this is is just to show you that we can of course use any kind of pulse we want and I'm also going to change the settings like in a forest I'm increasing the weight to 1.2 since we are going to divert substantially from the original pose I need to give the model more freedom to do that I need to lower the weight a little as little as possible and I'm also starting the generation at 10% .1 this is because the initial steps are the most important so we give freedom to the model to build our forest but then we start to apply the IP adapter right away maybe we can also end a little earlier like at 0.9 and this hopefully will be enough as you can see we still have the arm that comes from the original reference so we we probably need to lower the weight a little more and we can also use linear weight type that gives more importance to our text prompt still not enough let's try to lower the weight a little more I really want to get rid of the additional arm and this is pretty good now we can try to put him into a Tavern and at this point I have the image that I wanted and of course I can upscale it to get some more details we can set a pretty high the noise because we are using the IP adapter to drive our composition so we can use like 05 even more probably and this is pretty good and uh the face is still close to our reference now since we made this workflow modular by just changing this first prompt I should be able to get a new character with very little effort the only one that I probably also need to change is this one for the war cry but everything else should work let's try with a woman Barbarian and Rebecca Ferguson for the ladies we need more negatives let's see how it goes of course women always get plus three magical armor I would need more time for the outfit but the face is very good and very consistent now let me try with something simpler this whole process is very effective with simple modern clothing let me try the same woman in a modern setting and I'm sure the result will be a lot better [Music] [Music] and playing a little with the negatives we could get rid of the shorts probably and as always this is just to get you started because the improvements that we could do to these workflows are many also starting with a style or character Laura would increase drastically the stability of the image M and yeah I hope I gave you some food for thoughts and material to work in the weekend a quick announcement before I leave I was a bit scared about opening a Discord server I really don't have much time for support but I partnered with a latent place it's a German YouTube channel but they kindly agreed to convert their current Discord into an international server so if you have questions about comi or just want to post your images done following my videos go to latent Dov Vision Discord or click on the link in the description so I'll see you there it's all for today and ciao
Info
Channel: Latent Vision
Views: 26,343
Rating: undefined out of 5
Keywords:
Id: 6i417F-g37s
Channel Id: undefined
Length: 18min 42sec (1122 seconds)
Published: Fri Dec 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.