ComfyUI: IP Adapter Clothing Style (Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I am Mali and welcome to the channel today's tutorial focuses on how you can take a reference image and apply the clothing style to the source grounded Dino and segment anything models are used to automate the masking process you can take a step further and enhance the output of the source with iterative upscaling all this is done in a single workflow let me show you how some hacks and its limitations in comfy UI [Music] the workflow Json will be available for channel paid members via community post thank you for your continued support this is not a basic tutorial and if you are new to comfy I suggest you check out other tutorial videos on the channel page go to the comfy manager install custom nodes the impact and inspire pack is used for masking and upscaling the control net auxiliary pre-processors and the WIS node suit are required as well install python gos and the comfy uiip adapter plus nodes and lastly install segment anything after installation restart comfy UI from the command prompt and browser you need custom models for IP adapter and clip Vision go back to the comfy manager install models here search for clip fion and install the SD 1.5 clip fishion model then search for IP adapter and install the plus model for stable diffusion 1.5 to start add a load image image and a constrain image node keep the max width and height as 1024 this node will resize the image to that resolution and will maintain the ratio at the same time this is the best and easiest method for SD 1.5 or sdxl image processing without any math nodes we need to segment the subject's clothing add the grounding Dino Sam segment node drag and add the Sam model loader from segment anything there are multiple models to choose from for the tutorial select the highquality vision Transformer huge model as per my testing this works the best the selected model is automatically downloaded when you hit Q prompt for the first time similarly add the grounding Dino model loader and the 694 MB swin Transformer variant will do connect the image here a one-word prompt like dress or clothes would do if you need to select multiple segments you can separate them by a comma keep a threshold of 0.5 or higher if in an image you want multiple segments reduce the threshold the value represents the percentage of pixels occupied by the subject in reference to the total pixels of the image select a source image add a preview node and Q prompt to see the preview of The Mask search and add mask to image node and connect it you can actually control the size of the Mask via the grow mask node this comes in very handy and a singled digigit increase or decrease can give desirable results later in the tutorial I will explain how to manipulate this value with a live example for now I will keep the value at six for the reference image I recommend removing the background add the RBG node from wasit having only the subject helps with the nuances of masking the clothes face and hair the idea here is to create create a mask for the reference image and use that to impose the style over the sample image via the K sampler I am going to duplicate the grounding Dino node twice and connect them we need to separate the clothes from the face and hair I do this because sometimes the hair May overlap the clothes since the subject is isolated a lower threshold of 0.2 should suffice for the face and the hair duplicate the mask to image nodes twice I will connect the main unconstrained image and hit Q prompt it did not mask the face correctly for the face and hair it's advisable to have a grow mask node in between this way you can have a little buffer when subtracting the face and hair from clothes and the reason why it is still not masking correctly is because without removing the backgr the threshold is too high you can go with a very low threshold but I prefer removing the background to avoid accidental segmenting to subtract the mask right click and add mask minus mask node from the impact pack this node does not come up in search dress connects to mask one face and hair to Mask 2 the node subtracts Mask 2 from mask one for the style transfer IP adapter works best select the IP adapter plus SD 1.5 model do the same for the clip Vision the reference dress from the grounding Dino output will connect with the IP adapter image input for higher accuracy connect the subtracted mask to the attention mask input as well add the load checkpoint and vae nodes it is essential you choose a model trained for in painting The Realistic Vision 5.1 in painting model performs the best here I will leave the link to the model in the description add a vae en code for in painting this node takes the pixel image and encodes it for latent space add an advance case sampler and connect it with the vae en code and IP adapter add clip text and code nodes for positive and negative conditioning finish all the relevant connections with the case sampler the final mask output from The Source image will connect with the vae and code let's do some case sampler settings and hit Q prompt the output won't always be perfect there are many ways to correct this I will go step by step the first thing to do is add positive and negative conditioning this will reduce the number of undesired Generations the original dress mask is way wider than the source image sometimes increasing the mask size helps these distortions are because of the IP adapter input whenever using IP adapter always make it a standard to use prep image for clip Vision reconnect the image out output and add a preview I prefer padding for the crop position and try a sharpening value at 0.5 additional clothing specific conditioning is typically not required however let's say I put in a silver copper red dress in the positive it lends in the style better however you can avoid specific prompting altogether let's try another sample image without specific prompting some undesired results are seed specific before trying in painting or conditioning try changing the seed to see if the problem is fixed or not much better I want to show you an example that requires manual fixing there are certain limitations to the workflow I will give you bad examples at the end and explain them in detail let's assume that an output has artifacts and cannot be corrected via seed masking or conditioning to fix this manual ual and painting is required copy the four nodes which include the vae en code decode sampler and preview change the grow mask value to default which is six randomize the seed for the second case sampler there is a Noe in the impact pack called preview bridge this note allows you to create a mask and pass on the image and mask as separate outputs right click and open in mask editor draw a rough mask to cover a little more than the damaged area connect all the inputs and outputs for the model make sure you connect the load checkpoint directly and not through the IP adapter you would want the output to be generated in batches as it is convenient to choose from a range of fixes for the inpainting add and connect a repeat latent batch node let's generate in a batch of eight these are fixed but not refined you can now do an upscale and add some details to the image let's say you like the seventh output search and add a node called image batch Splitter from the Inspire pack increase the split count to the number of your batch generation now add two control net models one line art and the other depth we require control net here to ensure that even with a slightly higher denoising value the new upscale generation resembles the source reduce the strength of line artart to 0.4 and depth map to 0.5 also make sure the selected resolution for the pre-processor is 1024 connect the image 7 output from the splitter to the control net pre-processor inputs add a checkpoint and vae loader we have to select a non-painting checkpoint I would be using the realistic Vision version 6 beta add the positive and negative conditioning and connect to the control net for adding details and upscaling I will use the iterative upscale node this upscaling process involves repeatedly applying an upscaling algorithm to the image each iteration increases the size of the image enhancing details and sharpness an upscale factor of 1.5 means it will upscale SC the image 1.5 time the source three steps means it will split the upscale into three batches the first would scale it to 1.7x the second to 1.33x and the final step would be 1.5x if you want 2 x upscale then increase the steps to four it would then upscale by 0.25x at each step drag out the upscaler input and connect it with the pixel K sampler upscale provider for the upscale method use lenos pick a random seed I would increase the steps here around the 30 to 50 range use the same scheduler and sampler as before the deno's value depends on how much you want the details to change for my sample I would choose 0.35 connect control net checkpoint and vae choose a preferred upscale model here you can attach two hooks one is for the denoising and one for the guidance if you want to attach both choose hook combine the guidance is useful when there are many iterative upscale steps when you increase the steps the image will likely lose some colors and saturation at this point you can put a CFG hook with a value of 10 or 12 for the tutorial I am only going to choose the denoising hook a value of 0.5 would mean that with each step it will increase the denoise value at the last step the denoising would be 0.5 this really works very well and gives a better coherent output connect the image and the vae input in the upscale node put in conditioning and hit the Q prompt you can see the face has changed and improved along with other details if you want a specific face here I suggest you use a trained Laura also you can see the original image was at night and this one has daylight let's add the work night to the positive conditioning way way better okay so this is the final workflow let's review it quickly disable the upscale and in painting nodes select a source and reference image and Q prompt check the image output if satisfactory you are done this image has a slight blending overlay of the original dress this would be corrected via upscaling also I would like to remove the light source behind the head draw a mask and save to node enable the second sampler nodes and Q prompt make a selection then enable the upscaling nodes and proceed with Q prompt add any additional conditioning if required and that's about it the workflow is quite robust however there are some limitations to it the first and only rule is that if your Source image has a full body and the reference does not the AI won't know what to do without additional conditioning or in painting all the samples showcased here are at the default setting without any clothing specific prompting the source and the reference image can be different but there is a limit these are absolutely extreme poses if compared to each other in the process of applying the style it messes up the hands and the [Music] legs however changing the reference image with full legs does a much better job due to the pose this would be hard to fix via in painting maybe a random seed would work here but I wanted to show you non- cherry-picked images again maybe a ful length reference could work but there are a lot of errors in the output for some weird reason this worked on the first try where I expected it not to the output is reasonably fixable via the workflow even though the reference has no legs this one worked I guess it's because of the angle and pose I hope you found this tutorial helpful and learned something new in comfy until next [Music] time
Info
Channel: ControlAltAI
Views: 10,804
Rating: undefined out of 5
Keywords: ipadapter comfyui, ipadapter, comfyui, ip adapter comfyui, style transfer ai, ip adapter, iterative upscale comfyui, stable diffusion, comfyui ipadapter, comfyui tutorial, ipadapter plus, comfyui workflow, ip adapter controlnet, ipadapter attention mask, grounding dino, segment anything, comfyui segs, comfyui sam, comfyui impact pack
Id: YG6oif_nEGk
Channel Id: undefined
Length: 19min 1sec (1141 seconds)
Published: Mon Jan 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.