ComfyUI: IP Adapter Workflows (Tutorial)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi I am Mali and welcome to the channel IP adapter allows you to generate an output based on the source you can control your subject position using attention masking or blend multiple subjects and the environment to generate a single composure by using conditioning combine you can create multiple variations from a source and by adding mask conditioning you can manipulate the subject further let me show my packs and the different IP adapter workflows in comfy [Music] UI a big shout out to the channel members for their continued support this tutorial includes multiple workflows all of them are available via membersonly posts on YouTube this is a basic tutorial however comfy knoow is required if you are new to stable diffusion check out the video in the description it explains the basics of comfy comfy manager has to be manually installed go to the manager and install the comfy UI IP adapter plus node also install the comfy UI essentials from the install models tab search for clip vision and install both the IP adapter models it's recommended to download the models directly from the manager as the models get downloaded in the appropriate folder for the IP adapter models go to the hugging face page download all the dot safe tensor files within both folders place them in your comfy UI models IP adapter folder click on load default this will load the basic workflow search and add the IP adapter apply node and a load image node drag from the IP adapter inputs and add the following nodes I am using an sdxl checkpoint called juggernaut XEL for now choose the base sdxl model for both later in the tutorial I will explain Which models match each other to organize your saved files rightclick and convert file name to input connect it with a primitive node by doing this you can make comfy to create subfolders in your output folder and save files there the code is given in the description ensure that the empty latent resolution matches the stable diffusion version a simple prompt that includes cute kitten and Christmas let's Q prompt the reference image here gets encoded by the clip Vision model this encoder resizes the input image to 224 resolution and crops it in the center now if you use any image where the subject is not in the center you will get an undesired result this one affect the current image as the cats are in the center however it changes the environment slightly to avoid this add a note called prepare image for clip Vision which will do all these operations for you and you can see the preview of the cropped image let's say you take a very long image without this node the clip engine would get the following image with this node you can set the crop position to top and get the subject face centered if the image is too blurry try using the sharpening tool the image on the left is without the node as said earlier since the source image is a 1:1 ratio it did not affect the main subjects there are minor differences in the background keep this in mind when using IP adapter these two image encoders must be paired with the correct IP adapter model typically you pair the model with the matching clip Vision however only the base sdxl model works with the matching clip Vision to run any other sdxl model you should use the SD 1.5 clip Vision when pairing non sdxl models all will work with SD 1.5 clip Vision except the one to use this IP adapter you should pair it with an sdxl version of clip Vision any mismatch in the pairing will give you the following error so keep this in mind when using IP adapters I am duplicating the nodes five times this way we can compare the cropping method results in the prep image for the clip Vision node the cropping method you choose in the node affects the generation for the tutorial I use the first lenos method for cropping I find it the best nearest is the worst bilinear is better than nearest however it is still blurry box is mainly used for downscaling the contrast and details are less when compared to lenos Hamming offers a balance between details and smoothness by cubic is the second best out of the lot IP adapter has three weight types original linear and channel penalty these are at strength 0.6 original is very strong Channel penalty on the other hand is too sharp notice the sharpness around the ears The Prompt says cute kitten if I increase the strength to one then we get two kittens for linear however at 0.6 strength linear leans more towards the prompt I found this evident as well while using masking combine whenever you want the model to give more weight to The Prompt reduce the image weight and choose linear later in the tutorial I will show you a practical example of choosing linear over original The Source image I used is a square and contains two subjects I often stumble upon the AI creating additional or distorted subjects when going very wide or Too Tall let's pick a wider aspect ratio and see what happens okay three cats let's try some tall ratios as well this one is fine let's go even taller Yep this is elongated these distortions can be fixed with attention masking duplicate the load image node I am copying the output just for the ratio the image does not matter as I only connect the mask output to IP adapter the reason why multiple subjects are created is because there is no confinement what I will do here is just make a small rough mask the AI will try to generate the two kittens within this space cute let me edit the mask again I will try to expand the mask as much as possible towards the edges of the visible cat let's see what happens when I do that even though the mask was a rectangle it generated only two cats and now I can see more of the original image in the background so a wider mask area gave the AI more room to integrate the other elements from The Source image it would be interesting to see what happens if I take the original image and in paint both the cats as a mask I don't know why I am drawing so perfectly basically you can do anything rough so it changed the perspective a little let's try the linear weight type and it did nothing I guess it's because the weight is still one I will revert to the original for now and only reduce the weight to see what it does okay let's stick with this I want to try and manipulate the position of the subject I will copy this Ultra wide image again it makes it easier to define the position let's try left only one cat wait a second it's probably because of of the weight that's better let's try masking them on the right making the mask a bit wider cute again it got the elements from the background and included in the Generation Now I want to see what it does if I create two separate masks let's do right and left with some Gap in the center to really separate them this should be interesting nice did not expect that cool trick huh the goal here is to try and blend two subjects with an environment Aesthetics is not essential but I would like to see what works and to what level I have control I will quickly duplicate the IP adapter nodes when you have two or more IP adapters the checkpoint connects to the first and the case sampler connects to the last IP adapter node you then daisy chain all the IP adapters in between I will try blending the cat and the dog in one image before adding the environment I added dog in the prompt the control weight being one I got a hybrid of a dog and a cat I will not add any noise here but just reduce the weight to half for each that did something but it looks like the first weight is way more than 50% both of them don't seem to be equal let's add some noise and see if that makes any difference technically it should not the dog is now in the background and it makes it worse let's try color masking here I will use load image mask the image mask allows me to define the channel I will explain that in a bit so here you choose three colors that is red green and blue what color is for what doesn't matter matter I have taken blue as a background I will Define the subjects as red or green one thing you should take note of is that the color code for this should be pure blue green and red any color variation will not work this can be created in paint Photoshop or any graphic editing tool you have the drawing has to be a rough estimate I'll set the cat as red and the dog as green and cute Q prompt the outcome is as expected however you have to be careful here with the order of the flow let me explain the cat image is connected with the first IP adapter the dog with the second similarly red is on the left and green is on the right The Prompt says cute kitten and dog in all three places the cat is first and then the dog changing the order partially will result in undesired results when using multi- iip adapters with masking the sequence directly affects the output the order of the inputs like the mask image prompt and IP adapter nodes matters to explain this let's say I changed the prompt to say cute dog and kitten I don't change anything else check out the result just changing the prompt order generated a dog cat hybrid let me revert The Prompt back and change the color mask order mind you the mask is still connected to the correct IP adapters a different image but still a hybrid why does this happen the model processes the image from left to right because the nodes are connected linearly in The Mask the red is on the left since both masks are connected to the IP adapters the model assumes red to whatever is given in the first image and prompt keep this in mind when using multi iip adapters to get a more coherent output let's end the first IP adapter at 40% of the steps and start the second one from there perfect now I will add a third IP adapter and this time I want to take elements from a scene and try to influence the output the more IP adapters you add the more difficult it becomes to balance them out the channel here will be blue for the background let's end the first at 30% of the steps second at 70 and reduce the weight by1 minimal difference there are stars in the carpet however it is neglectable in the output let's try adding bed to The Prompt okay let's change it to besides bed the stars are somehow more prominent I tried other prompts and settings but this is the best I could get while keeping the two subjects more or less consistent in this workflow I will disconnect the IP adapter nodes and generate an image based on a prompt connect the vae decode output directly to the IP adapter node add another case sampler and connect the relevant inputs from this second case sampler I will generate four outputs for variation let's control the cat posture by adding control net I am adding a simple control net apply along with an all-in-one pre-processor node connect the positive conditioning I am using a control Laura open pose model and for the pre-processor we will use the animal pose this is how the animal pose looks like connect to the sampler and Q prompt IP adapter does burn the image the solution is to lower the guidance value or reduce the adapter weight duplicate the positive prompt wearing a cowboy hat I will combine the prompt and the control net via conditioning combine this is not that good let's add a depth control net and see if that makes any difference way better let's move on to the last workflow in this workflow I will show you how to use mask conditioning let's generate a cat wearing a cloak let's try to change the cloaks color to Pink via prompt first that did not work at all add a load mask image paint over the cloak area of the cat and add a conditioning set mask node when you set the conditioning area to mask bounds it will only apply the condition within the masked area the default applies it throughout the image there are not enough pink cloaks let's increase strength to four and this would be an excellent example for switching the weight type to linear excellent one last thing left to do now I want to add a beanie hat to the cat there are two ways to do this the first method is to edit the mask and the prompt for the existing workflow and add a beanie hat I was not happy with the results for this one so I am going to show you the second method for the second method duplicate The Mask conditioning and the conditioning combine notes connect and Q prompt having two mask conditioning in a conditioning combine caused some errors in the generation there is an easy fix for the beanie mask conditioning set the conditioning area to default let the first cloak one remain to mask bounce only I hope the tutorial was helpful and you learned something new in comfy UI until next [Music] [Music] time

Info

Channel: Control+Alt+AI

Views: 6,192

Rating: undefined out of 5

Keywords: comfyui, ipadapter, ipadapter attention mask, stable diffusion, comfyui tutorial, comfyui workflow, ip-adapter stable diffusion, comfyui nodes, comfyui easy workflow, comfyui ipa, comfyui controlnet ip adapter, comfyui ipadapter, ipadapter comfyui, ip adapter, ipadapter plus, ipadapter tutorial, stable diffusion comfyui ipadapter, comfyui ipadapter install, comfyui sdxl ipadapter

Id: KHt-2nZsY9E

Channel Id: undefined

Length: 27min 27sec (1647 seconds)

Published: Sun Dec 31 2023