The "Secret Sauce" to AI Model Consistency in 6 Easy Steps (ComfyUI)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi and welcome back to another comfy UI tutorial today is a special day because we're going to combine our skills from previous videos to create the ultimate fully customizable AI model in this video we'll cover how to get a digital model's face choose the right pose set up the background and dress up our model then we'll improve the face and as a bonus enhance the hands I'll explain each step in every node in detail and you'll find all the resources in the description box if you're new here don't forget to like And subscribe now let's get [Music] [Applause] [Music] started first we'll generate a face for your Digital model this is all about personal preference you can create a face from scratch just like I did here I'm using the realis XL version 4.0 lightning checkpoint model The Prompt should be a close-up photo that details your desired facial features I'm creating a batch of four Images which I'll use later with the IP adapter additionally I'm using the image save node from the was node Suite to save the generated images to a specific path start a new workflow using the realis XEL checkpoint model from the Inspire pack nodes open load image batch from Deer to import all the face images we generated and saved just enter the path here next load the IP adapter and connected to the batch images our model and the case [Music] sampler set the denway strength to 0 .75 for the image Dimensions set the width to 832 and the height to 126 now we need to set up our open pose load the image that has the pose you want to replicate use the DW pre-processor from the control net auxiliary pre-processors group we'll also need the apply control net Advanced node load the control net model in our case it's open pose XL2 and connect it with our positive and negative prompts and the case sampler make sure to disable the face and hands in the open pose to give the model freedom in those [Music] parts before generating let me share the simple and generic prompt I'm using since we want our Digital model to wear a desired short sleeve top it's easiest to give it a short sleeve t-shirt now this will look better when we use IDM Von in upcoming steps also use the plus face model instead of the plus model in IP adapter let's generate four images and see the result as you can see the pose is correct which means our open pose nodes are working properly [Music] for the background simply change it in the prompt nothing complicated here a handy trick is to use a fixed seed to better control the outcome start by generating a batch of four Images at once this speeds up finding the best [Music] result by changing the seed number and running a few more Generations I found an image in the batch that I'm happy with to select this image use an image from batch node remember index zero means the first image so if we're picking the second image its index will be one [Music] perfect there are two methods to make our model wear the target garment the best way is to use IDM Von within comfy UI which I explained in detail in my last video however this method needs a lot of GPU power and might not work for everyone the good news is you can use the IDM Von web demo on hugging face for the same results and that's what we'll do first export your generated image and edit it using an image editor like photopia create a new project with Dimensions 768 by 1024 which is the resolution required by IDM Von C copy and paste your image into the new [Music] project resize it to fit the canvas and save it in 768 by 1024 resolution next open the idmv ton online demo on hugging face links will be in the description load your Digital model and the target garment for example describe it as a short sleeve [Music] t-shirt uncheck the auto mask and use the brush to manually mask the t-shirt for more [Music] Precision now generate the image [Music] change the seed number and keep generating until you're happy with the [Music] [Music] result once done download the image [Music] here's our model in the Target clothing looking great already there are still things we can improve that simple upscaling won't fix so let's move to the next [Music] step to enhance the face we'll use inpainting with the help of Ip adapter pay close attention because we're going to unleash an amazing trick for the best quality possible first add a face bounding box node from face analysis in comfy UI this detects and crops the face in your image use Insight face as the face analysis model I'll link a guide below for installing it since it can be tricky next use the image resize node from comfy UI [Music] Essentials set the width to match your image with height to zero choose Lanos for interpolation and enable keep proportion previewing these nodes you'll see the face detected it's close but adjust the padding percent to 0.6 to give it some space moving forward add a vae oncode and connect with a set latent noise mask [Music] node mask the face including the hair but avoid the [Music] edges use a gsan blur mask to soften the edges making the final output look well fused [Music] next add a clip text encoder node and write the positive prompt blonde beautiful [Music] face let's copy the previously used K sampler by holding Control Plus shift plus v to paste it with all pipelines connected connect the clip text encoder and enable the IP adapter plus face this will help us bring back our face features since the face is now [Music] closer set the denoy strength to 0.5 and don't forget to connect the clip with the text encoder node and also the latent noise mask with the K sampler now our face matches the source model and looks much better now let's place the improved face smoothly on top of our original image first let's create an image resize node to return our image to its original size using the face analysis node now we convert the width and height to [Music] inputs and connect them with face analysis original image width and height [Music] next we load the image composite masked node and we connect the face image as the source the original image as the destination and the Blurred mask we made [Music] earlier also we need to convert the X and Y coordinates to inputs and Link the original X and Y from the face analysis node using a preview bridged we can now see the result preview the result and you'll see our AI model's face is now highly detailed and closely resembles the source images [Music] the most frustrating part about AI image generation is hands and it takes a lot of effort to get them right but I'm going to show you a great method that really helps first we'll use a similar method to what we did for the face since we can detect hands we'll manually crop them using the image crop node from comfy UI Essentials locate the hand position keeping the width and height at 256 pixels [Music] adjust until you get the hand cropped correctly now that the image is 256x 256 we need to upscale it by 4 to get it to 1024x 1024 for use with the sdxl model if you want to use an SD 1.5 checkpoint model upscale by only two to make it 512x 512 this applies to face improvement too next we use the mesh reformer hand refiner from control net auxiliary processors this node detects the hand and creates a mask and depth image with the correct number of fingers and position [Music] connect this node to an apply control net Advanced node for the positive prompt type hand load a depth model called Zoe depth model for sdxl if you're using SD 1.5 use other recommended depth models which will be linked below place these models in your control net folder in the models directory add a k sampler and connect it with the apply control net and your checkpoint model for the latent bring in a set latent noise mask convert the hand image to a latent image and connect it with the samples also use the mask created by mesh gra forer hand refiner and add a bit of gossan blur before connecting it with the set latent noise mask [Music] node don't forget to connect the clip and the image from the mesh Gruff forer node generate with a 0.5 to noise strength and check the result the hand should look much better now adjust the doo strength between 0. 4 and 0.5 and play with the seed number until you get a well structured hand that you're happy [Music] with now let's bring this image back to the entire model's image scale the image down by four which means 25% [Music] load an image composite mask the destination should be the entire model image with the improved face and the source should be the downscaled hand image using the mask from the mesh graph warer convert the X and Y input and connect them with the original from the image crop [Music] node to further improve create a new mask smooth it with gossy and blur and connect it to the image composite masked node now you have a well-refined hand [Music] by comparing the image from the beginning with the modified one you can see the improvements in the digital model's face pose background and clothing we also fixed the face and hands if you liked this tutorial don't forget to like share and subscribe you can find the workflow used in this video the custom noes and prompts in the description below see you in the next video [Music]
Info
Channel: Aiconomist
Views: 41,750
Rating: undefined out of 5
Keywords: comfyui, stable diffusion, ai art, workflow, automatic 1111, forge, midjourney, ai
Id: nVaHinkGnDA
Channel Id: undefined
Length: 16min 48sec (1008 seconds)
Published: Wed May 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.