Attention Masking with IPAdapter and ComfyUI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is mato I am the developer of the comy UI IP adapter plus extension and today I want to show you an exciting new feature that I've introduced to my extension I'm talking about attention masking and this is probably the most important update to the IP adapter since the introduction before talking about masking though I want to show you very quickly another small update that I've done which is weight type weight type is basically three algorithms that you can use to apply the weight the difference between them is not big but I think that it's worth having the option so my reference image is this woman and in the prompt I wrote a photograph of a warrior woman in a cherry blossomed Forest let's see the result so I'm using a plus model so it's very strong and I'm also using a high weight so my text prompt is almost completely ignored this is the result with the original weight type that is what we had so far but let's try with linear so as you can see now in the background I start to see a forest so basically the difference between original and linear is that original is a little bit stronger and linear gives more importance to your prompt the last one is channel penalty and it's probably as strong as the original maybe even stronger but the result is sometimes sharper and it gives more details so I wanted to add this method as well let me show you all three options side by side I'm copying all these notes with shift contrl V I get all the nodes keeping all the pipe connections so the first one will be original then linear then Channel penalty let me tidy up a little okay let's see the difference so this is the original that is what we had so far very strong this is linear and it gives more importance to the text prompt and this is channel penalty that is usually sharper you can see the difference with linear we can try to add like green hair it started adding green hair to all pictures but in linear all hair is green while in the others you can see it's half green and half red they are kind of experimental algorithms so let me know what you think I don't know if they will change in the future or if I will add more but yeah experiment a little with them and I'm looking forward to your feedback now the piece of resistance masking I'm setting the width of the image to 768 and generate so first of all the model spreads the character over the whole picture because in the reference the character takes the whole frame and also of course there's no cherry blossom in the background so let's see what happens if I apply a mask I need a load image node I can then copy this image past here I'm using this node just as a reference you can create the mask in any way you want this is just very conveni vent because I already have the image at the right size and I already know how the composition is going to be so I open the mask editor and I select where I want my character to be so in my new image the character should be only rendered in the center now I can connect the mask to the IP adapter and give it a try and as you can see now the woman is in the center the cloak is not spread all around and I have some beauti beautiful cherry blossoms in the background and if you zoom in you'll notice that the transition between the background and the character is completely seamless especially here on the hair and it is important to note that the background is actually a photograph while the character is still an illustration this is because in the text prompt I have photograph so anything that is outside the mask is taken directly from the checkpoint and only the text prompt it used for the background in this case this is not entirely true because there's always some bleeding between all the elements in the generation so the background will never be a 100% photograph and something will be taken from the reference image but anyway it would be extremely difficult to get a result like this with just one Cas sampler in just one pass without in painting or whatever we can make another test saying Warrior woman on the streets of New York and again the background is photo realistic while the main character is an illustration taken from our reference and the transition between the two is absolutely perfect what's cool about this is that of course we can use multiple IP adapters to merge different styles masked in different positions inside the picture so for example I can copy the IP adap node connect the first to the second then to the K sampler I'm taking a second reference image so I want one character on the left and one on the right there are many ways we can create the masks if they are simple we can use the integrated mask nodes I need a solid mask a feather mask mask composite and to see what we are doing I'm also including uh preview mask the sides of the first mask will be half the horizontal sides of the final picture so 768 / by 2 I need two of this then I'm Feathering The Mask so to have a better transition between the two subjects this is the mask on the left so I'm weathering the right by 150 pixels I need another one this one is feathered on the left now I need another solid mask this time the size of the full generated image 768 and this will be black I need two mask composite so the first one is the one on the left the operation is OD and let's preview it okay this is the mask on the left and the one on the right and I need to move the mask by 384 pixels to the right now we have these two masks you can make them with Photoshop whatever but it is nice to know that you can make them directly on comy UI now the first mask goes into the first IP adapter and the second and the second we can reenable the image generation we need to tell the model what we are trying to do and I would say that we are doing two girl friends laughing let me add some noise that always helps in both IP adapters I'm also going to try the new chanod penalty algorithm and I think that this should do and now we have the composition split between our two references the transition between the two girls is perfect we can try a few more seeds ah and this is a lovely picture with two perfect BFF you might say rightfully that the background is a little boring so let me show you real quick another example using the load image mask I've prepared in this really simple image that I'm going to use as a mask I need three of them so I can set a mask for each color the first one is blue the second one is is red then I need another IP adapter we connect the second to the third here I'm choosing another image nice cool Iceberg the color is green for the background and let's see what happens generating images like this is like cheating at this point compositing has never been easier and I'm looking forward to seeing what you are going to do with this new feature one last experiment before closing I've taken these two very different pictures one is a cute anime girl and the other is a photograph from a Runway I've already prepared the mask and the prompt is more or less the same two girlfriends laughing touching heads in Tokyo this is interesting because we were able to merge two completely different style and the integration is completely seamless but what if I want to make some changes to just one of the two girls so to do that I can use masked conditioning with conditioning set mask node let's say I want the photorealistic girl to be blonde I already have the mask I'm creating a new positive prompt connect to the clip then to the condition ins set mask and I'm using as prompt photo of a blonde woman on the streets of TQ then I need to merge the two positive prompts and I can use a conditioning combine I'm taking the second prompt the first one and then to the cas sampler I think that we probably also need to give more weight to blonde let's see if it's enough and as you can see now she's blunt of course this is just the tip of the iceberg because we could also have dedicated negative prompts I could add a positive prompt just for the anime girl and we could also add open pose or style control net to guide the composition even further and really the possibility are endless just a couple of things to remember first the size of the Mask should be the same of the final image the apply IP adapter node tries to fix any difference in sides so even if you send a wrong sides mask it is going to work and if the mask is simple it should be okay but of course it is better to give the mask of the right Dimension second thing is that we are generating these images with just one checkpoint so if your subjects are very different you need a checkpoint able to handle all the Styles so if the style of your reference images varies a lot be sure to pick the right by checkpoint I think it's all for now go create something cool and see you next time ciao
Info
Channel: Latent Vision
Views: 21,833
Rating: undefined out of 5
Keywords:
Id: vqG1VXKteQg
Channel Id: undefined
Length: 11min 37sec (697 seconds)
Published: Tue Nov 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.