ComfyUI IPAdapter (SDXL/SD1.5): Create a Consistent AI Instagram Model

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone welcome back to data leveling I'm Han and in today's video we will learn how to create a consistent character face using an image prom adapter or what we call IP adapter what we want to achieve here is when we give an image prom of a face or clothing and with the mixture of text prompts to achieve a character that is able to maintain the face details consistently so be it you want to create an AI Instagram model or creating a consistent face for a certain branding or storytelling this video should help you to a certain extent so IP adapter is a project from tensen AI labs and a quick explanation on how the IP adapter works is that instead of only using the traditional text to image proms it adds a function to use image as a prompt or what we call image to image how it differ from previous works that have attempted this is that the training process of the model does not affect the text prom as it is using a decoupled cross attention mechanism for text features and image features if you want to learn more about how it works back end you can read more about it from their research paper I will put all the relevant links in the description I would also like to thank Mato who is the developer bringing us these custom confy UI IP adapter notes and working really hard to update it regularly to be on track with the tensen AI lab repository you can check out his YouTube channel Laton Vision his content is focused on IP adapters ranging from a beginner to an advanced level before we move on I suggest you watch my previous video on installing inside face for comy UI on Windows as inside face is a requirement for IP adapter face ID models and I see a lot of people having issues with installing it assuming you have inside face installed let's install the dependencies required head over to confy UI manager under the custom node section search for IP adapter and install the confy UI IP adapter plus nodes next we want to search for impact p and install the confy UI impact pack noes lastly search for control and install the ox control net preprocessor noes and if you're using SD 1.5 search for ultimate and install the ultimate stable diffusion upscale noes we also have to download a clip Vision model from the confy UI manager models tab search for clip and install the one that says VI mine says not installed as I actually installed manually for the IP adapter models there are two parts to this face ID IP adapters and normal IP adapters we will first download the face ID IP adapter models visit the face ID IP adapter repository in the description and download those in rate for the normal IP adapter models visit the IP adapter repository in the description then you will see two subsections one for sdxl and one for SD 1.5 if you're using sdxl download those in red and if you're using SD 1.5 download these models now we will place the models into the confy UI model folders those models with Laura in the F name will go into the Laura's folder and the others will go into the IP adapter folder if you don't have an IP adapter folder you can create one and name it IP adapter with no spacing if you are not using the default confy UI models folder remember to update the directory part for IP adapter in the confy UI extra models PA file all right let's get started make sure you restart comy UI to have everything updated the first thing you want to work on is getting a face I'll be generating one using a sdxl checkpoint model as I want my corre face to have more details if your system is unable to run sdxl models due to the high vram requirements you can still watch through the video first as the concepts are similar it's just using of different model variants towards the end of the video I will also have a section where I show Which models to use for St 1.5 I will try to create a Korean either like girl to use for the human model and I will be using the Liam Hollow World checkpoint model simply because I like the details of this checkpoint model more than the others as I'm using a sdxl model we can use a larger pixel dimensions to capture more details I'll be using 1024 pixels the negative proms slightly differ between different models hence I would just use one that is more generic and good for realism the prompts I used are mainly to describe the face that I want the first two lines are for setting the image environment and the subsequent lines are to describe the facial features as for the sampler you can play around with the configurations but I'm using a basic one that usually give me decent results for the face feature prompts I'm using mainly features that will stand out more and commonly found in Korean Idols I also broke the lines for the proms and order it in a top- down fashion in terms of face structure so that it is easier for me to change specific Parts you can add more weight to a specific prom by highlighting the word and hit the control and up [Music] key I will also be adding a group for easy separation you can do this easily by holding control and clicking all those notes that you want to be in the group then right click a blank space and select add group for selected notes once we are confirmed with a face to work with we will create a load image note we will then right click on the generated face select copy clip space and then right click the load image and select paste clip space like this the next step is to crop the image to only have the face in the center of the image the width and height should be equal as the face IP adapter models assume a square input the x-axis controls the horizontal movement and the y- AIS controls the vertical Movement we will then go into the IP adapter notes and select prepare image for inside face you can sharpen the image if you want by tocking the sharpen variable but since my image is already very sharp I will skip this step the pet around variable is recommended for better accuracy once that is done we will group it up and call this group image pre-processing one additional step that I use for slightly better accuracy is to mask out the area of Interest we can do that by creating another load image node copy the clip space image over and then right click the image and select open in Sam detector so Sam stands for Segment anything model that is developed by meta and you can simply select a point of interest and it will segment out the object when you left click it means a positive do prompt and right click is a negative do prompt once you're ready click on detect increasing the confidence skill will segment out more precise objects and a lower confidence skill will segment out a more generic object if the mask is not perfect you can right click on the image and select open in M editor and complete the remaining imperfections I will first Show an example where we only mask out the face without the hair the next step is to create an apply IP adapter face ID node this part is where it gets a little bit tricky because there's another note called apply IP adapter I will create another checkpoint loader note as I want to use a different one from the pH generator earlier I will be using Juggernaut XL for this model we will then create a Laura model I'll be using the note that says load Laura model only once that is created join up the lines from the checkpoint model to the Laura loader to the IP adapter for the IP adapter loader I will place it above my IP adapter node and select IP adapter face ID plus V2 sdxl for the clip Vision note we will select the clip Vision model that you have downloaded earlier and for inside face we'll use CPU over here we will then link up the image and the mask for the face ID configuration I will set the weight to 0.5 and since we are using the V2 model set the face ID V2 to true and set the weight V2 to 1.5 so it may get a little bit confusing due to all the different models from Dand lab just remember that the apply IP adapter face ID node requires a face ID model and every other models will use the apply IP adapter node the keyword to look out for is face ID in the model's name this might change in the future if there is an update to the notes but for now this is what we have to do before I forget we have to set the Lura strength to be around 0.4 for the L image we will use 1024x 1024 pixels so usually the first time loading it you will take a long time especially for inside face okay sometimes we see a totally different face when just masking the face so now I will mask out the entire head including the hair as well the hair is consistent now but the face is still performing poorly so what we can do to improve the accuracy is by applying one more round of IP adapter you can either use the IP adapter face ID node or the normal one I'll be using the normal one as I have better results with it for the IP adapter model we'll be using IP adapter plus pH sdxl so if you're are using the normal apply IP adapter node and working with ph we'll use the one that has plus pH and if working with other items in the image we'll use the model that says IP adapter Plus sdxl without the face word we will then link up the clip vision and images and for the model we will link it up to the previous apply IP adapter face ID note and for the attention mask use the same one as well for this model it is important to set it to a lower weight of around 0.2 to 0.4 now if we see the results it is much closer to the reference image let's add some text prom to set the environment of the output we can also change the height of the latent image to have a portrait look to capture more details sometimes we might get an image that is too zoomed out an sdxl model will not perform well as the face requires more pixels to show the details properly so we can fix that by changing the negative prom and from here on we can see that every image generated will have a more consistent face to the base reference image another power of Ip adapter is that we can use it to adapt the clothes for the model most of the time we use TXS to set the clothes but we might not get exactly what we have in mind so what we can do here is to have a reference image of the clothes you want to see on the model and apply the same way we did before for the face it will be good to color off the face of the fashion model that you are using with pain or while snipping the screenshot so that you will not confuse the model with the original face we want to use we will do the same as before but this time our IP adapter model will be using the IP adapter plus sdxl model connect the lines and don't forget to mask out the clothes that you want if your corrector is fully clothed like mine do also to add some additional portions like legs so that the IP adapter model will know better for the strength we can set it to anywhere around 0.3 to 0.5 and you should get decent results I have to mention that this will not completely give you an exact copy of the clothes but if you produce a similarity of at least 85% so now we can try it on with a different clothing I will use a gray dress as an example as we can see most of the results are good and able to maintain the face while also maintaining the [Music] clothes but we are not done yet we can also use a control net to change the post we want for our AI human model we start by loading an image of a person doing a certain Poe from the control net pre-processor note section select the DW post estimator this note allows you to specifically choose certain areas that you want to detect for the dimensions I'll be using 1024 to be consistent with the latent image the bounding box detector uses YOLO which stands for you only Loop once it is the current state-of-the-art object detection model both YOLO X and yolon Nas works pretty well in determining the bounding box location but usually want to choose the large version as it is strained with more parameters for the Post estimator I will use the O NN X version as well this is because o NN X is usually used more often for production in the industry we will then create a preview image note to have a sense of what the post will look like next we are going to create an apply control net node since we are using sdxl model we will need to use the sdxl control net preprocessor I will be using the TBO XL open post model for the control net string we'll set it to a range between 0.8 to 1 I have to mention that when we use the IP adapter for the clothing sometimes the pose of the fashion model might also be taken in as well depending on the weight of the IP adapter when that happens remember to increase the control net strength let's try out a different post to see the effects all right I will now start with SD 1.5 the first thing we need to do is to disable the sampling so that it will not run when we are still doing the pre-processing step you can right click on the group select notes then hit the control and b key or you can right click and click on bypass group notes for generating the face we just have to change the model and the latent image size we will then perform the same step of cropping only the face without the head masking it still works decently but I just do it out of practice we will then change the checkpoint models the luras and the IP adapters so for most of the IP adapter sdxl models are used earlier there is a 1.5 variant as well you just have to be careful to choose the right one we also have to change the latent image to be 512 X 768 as you can see I forget to change the IP adapter for the cloes therefore we get some weird results for the control net also change to SD 1.5 version and change the resolution to 512 once you are done with it I suggest to upskill the image if you don't know how to do it you can check out my confy UI upskill image video where I share the best ways to upscale and image if let's say you are unable to find a post that you want from the internet you can try out this web application called Magic poser it's free and easy to use and and you don't even require an account to use it so once we are in we can create a character by selecting add the mouse scroll is used to zoom in and out holding the left click will rotate the camera angle and holding the right click will move the camera angle in a 2d space if you click on the character and select presets you can also use some of the poses that was already made you can make adjustments to the pose by dragging those parts that have black labels once you finalize on the post select preview at the top and screen capture it another cool feature that magic poser has is that you can change the hands of the model to swap between right or left hand you can just click on the label below the hand post word if let's say you're using line art or other control net pre-processors in some cases the sky and the ground matter so you can choose to enable or disable it from the toggle tab okay I think that's about it if you learn something from this video do have to leave a like And subscribe for more content like this it will really help the channel grows and serves as motivation for me as well if you face any difficulties following the videos do also leave a comment and I will try my best to help you and remember don't stop leveling up
Info
Channel: Data Leveling
Views: 22,997
Rating: undefined out of 5
Keywords: IPAdapter, SD15, SDXL, CV, SD, IMG2IMG, FaceIDv2, aiart
Id: oYjEFHb--RA
Channel Id: undefined
Length: 19min 26sec (1166 seconds)
Published: Wed Jan 24 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.