STOP wasting time with Style LORAs! Use THIS instead! How to copy ANY style with IP Adapter [A1111]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone keyboard Alchemist here and welcome back to another stable diffusion tutorial today we're going to talk about how to use IP adapter to copy the style of a reference image with zero text prompts and then generate a new and beautiful image of your own no prompts no workflow no problem now let's go have some fun because IP adapter is a control net model first I will show you how to install control net and download the necessary IP adapter models then we will walk through how to use these downloaded IP adapter models and lastly I will show you a comparison of Ip adapter versus control net reference only go to the extensions tab then the available tab click on the load from button and a list of available extensions will show up in the search bar type sd- web ui- control you will see the control net extension here note I already have this extension installed but you will see the install button on the right hand side click on it and wait a few seconds for the installation to [Music] complete once the installation is finished go back to the install Tab and click on apply and restart UI once your web UI has reloaded scroll down to see the control net section click on it to expand this section and view all your available options you will see the radio button for IP adapter when you click on it the IP adapter pre-processor and model will be automatically selected if you are using a stable diffusion version 1.5 finetuned model then select the sd15 pre-processor but if you're using a sdxl model then select the sdxl pre-processor in the models drop- down menu we have four models to select from they have to be downloaded from hugging face I will leave links in the video descriptions here's how to download the models first go to ilas Fields hugging face page and download the three IP adapter. pth files that you see here then as an optional step you can go to h94 hugging face page and download the IP adapter plus Face sd15 doben File but honestly during my testing I didn't really find this model to be beneficial or provide any better results than the other models so I would say you can go without it feel free to let me know in the comments if you find some benefit to this face plus IP adapter model I would love to get some insight into how to use this model effectively note if you want to use the plus face sd15 model you need to change thebend file extension to pth click yes when the warning message pops up once you have downloaded these four IP adapter models go to your main SD installation folder then go to the extensions folder find your SD web UI control net folder then the models subfolder and you need to drag and drop your newly downloaded models here now go back to your web UI and reload the page after reloading you will be able to see the IP adapter models in the model dropdown menu now that we have everything installed and ready to go let's get to the fun part again here is the reference image that we will be working with the idea is to copy the style of this image and make a new image out of it we will start by using the Rev animated model but really you can use any general purpose model I'll show you a comparison of a few different models that I have tried and you will see why here are my text to image parameters we will use the DPM Plus+ 2 m caras sampler 40 sampling steps 768 width and 5 12 height and everything else can be the default value scroll down to control net enable a control net unit and check Pixel Perfect then put your reference image into the reference image box here click on the IP adapter radio button to enable it since we're using the Rev animated model which is a SD version 1.5 fine-tuned model we are going to select the sd5 pre-processor and we will also choose the IP adapter sd15 model click on just resize and generate an initial image to see what it looks like we see that the image has the right buildings River Etc but the image is too plain the lines are not sharp and there's not enough detail compared to the original we definitely need to tweak something things since not all seeds are created equal I did a batch of four images and found a seed that created an image with the composition that I liked so then I can use this same seed to do my next testing step note you can see here I am using highres fix with the 4X Ultra sharp upscaler this is not needed when you want to generate a batch of images in order to obtain a good seed because it will increase your image generation time but highr fix and the 4X Ultra sharp upscaler is important for generating the final image with more details and sharper outlines I like this image on the right so I'm reusing the seat and added some simple negative prompts to it as you probably know negative prompts have a big impact on the quality of the image here is my simple negative prompt consisting mostly of quality keywords then we're going to generate another batch with the negative [Music] prompts we can clearly see that the image quality is much higher you can see much more details in everything the lines are sharper Shadows are deeper the entire image now have some material feel to it as opposed to the flat image from before now the cool thing with image adapter is that while you are using the reference image in a place of text prompts you can still add additional positive prompts to generate the same style of image with different objects in it for example here I put in some simple keywords like River and boat then to my delightful surprise I got a house on a boat I kind of like this accidental image I modified the keywords a bit and added dark clouds and thunderstorm then in this new image we have a castle in the background dark rain clouds and a storm next we can increase the weights of some of the keyword to make these objects or features more prominent in the image note for generating this batch of image I changed the seed back to random before generated then in the resulting images we can see that the dark clouds and thunderstorm are definitely getting more prominent but the castle is still not very noticeable this is because the main object in the reference image is the house or Cottage so when we have IP adapter turned on it is going to do its job and maintain the image composition of the original image as much as possible I will show you a bit later in the video how to solve this problem and make the castle more prominent before we do that I want to to show you the effects of increasing denoising strength within highr fix I will be using the XYZ plot script to try a range of denoising strength values from 0.35 to 0.85 if you don't already know how to use the XYZ plot script it's a great tool to help you try out a range of different values without having to manually enter a different number before every generation check out my previous video on XYZ plots to learn more now remember what highr fix is doing it is essentially performing an imageo image latent upscaling in other words it takes the base 768 x 512 image that was just generated and combining it with a matrix of random noise and then upscaling it by our specified upscaling factor of 1.5 the den noising strength is the value that determines how much of the random noise Matrix is being added to your base image what gets generated from the random noise is being controlled by the text prompts so the higher that the noising strength is the more changes you will likely see in your output image that will resemble the text prompt and that is why we can see more of the objects described by the text prompt with higher Den noising strength values after seeing the effects of the den noising strength values I change my prompts a little bit to Castle 1.2 Towers mountain and cloudy skies then generated a bright sunny version of the same image now the bright and sunny version of the image comparing with the reference image is a little bit closer but I realized the reference image is in a different aspect ratio it looks like it was in a 16 to9 aspect ratio but my image was using a 4 to3 aspect ratio so it changed the width to 960 pixels and height to 540 pixels and as you may know the aspect ratio changes the composition of the image quite a bit but I P adapter kept the style of the reference image consistent then after a lot of tweaking of the prompts and trying a range of the noising strength values I arrived at this final image this is using prompt Castle 1.2 Towers Mountain cloudy boat with 0.4d noising strength and highr fix this final image looked pretty good but we got to go deeper I wanted to test what the different types of models will do to the final image as you can imagine each model has a unique style and produced a slightly different final image as we can see I used a couple of stylized models an anime model and two realistic models all models that I tried gave me a final image that is similar to the style of the reference image except for the anime model don't get me wrong I actually love the painting like style that was generated by this anime model but since our objective was to copy the style of the reference image as close as possible then this result is not ideal but this brings me back to the point that I made earlier in the video as we can see as long as we use the general purpose model such as stream shaper Photon or epic realism the resulting image was pretty close to the reference image so I would encourage you to try a few different models when you try to replicate the style of a reference image and see which one gives you the best results now coming back to IP adapter I want to show you what parameters you can tweak within the control net unit and how they would affect your output image first let's start with the pre-processor this one is pretty straightforward if you're using sdxl base model or fine-tuned sdxl model then you have to select the sdxl option then there are the three different resize modes this setting doesn't seem to change the output as we can see all three images look the same this could be due to the fact that we have Pixel Perfect turned on so feel free to use whichever mode you want or just the default one for the different model choices there is sd15 versus sd15 plus we can see that they give different results I think sd15 provides a style that is more like the reference image so I like this one better but that is not to say that sd15 plus doesn't have its uses perhaps in a different case sd5 plus will be better so it's probably best to try both now we come to the more interesting parameters control weight starting control step and ending control step control weight determines how much the reference image will influence your output image needless to say the higher the control weight the more your final output image will be controlled by your reference image as we can see from this XY plot when the control weight is low the image looks a lot different because the final image is mostly influenced by the text to image generation process but but as we increase the control weight the final image is more and more controlled by the IP adapter process and therefore looks more like the reference image now let's take a look at starting control step versus ending control step we have to consider these two parameters together these two values can be plotted as control guidance start and control Guidance end values within the XYZ plot script a higher value for the Start Control step plotted on the horizontal axis means that IP adapter or control net will start after a certain percentage of sampling steps have been completed for example a start value of point4 means start IP adapter when 40% of the sampling steps are done having control net guidance start equal 1 is equivalent to having no control net at all because this is telling control net to start when 100% of the sampling steps have completed by the same logic a higher value for the ending control step plotted on the vertical axis means that IP adapter or control net will end once a certain number of sampling steps have completed we can see from the example XY plot if you start control net later I.E setting the starting control step to a higher value you will retain more of the image that was generated from your text to image prompt before control net starts to apply the reference style onto the image so in case you want to see more of a change in your output image based on your text prompt in my case I mentioned earlier in the video that I wanted to see more of the castle then I will set the starting control step to a larger value say maybe point4 then I will see a very prominent Castle as the main object of the image but remember do not start control net too late because if you started too late you won't have any of your reference image style or elements in the final picture so to strike a good balance between having your text to image prompt showing up prominently in your final image and still retaining the style of your reference image with IP adapter I would recommend a control net starting value that is slightly below 0.5 and an ending step of [Music] one hey if you enjoy my videos please do me a favor and hit the like And subscribe buttons your likes and subscriptions Help Me Grow this Channel and allow me to continue making quality content thank you right now you might be thinking doesn't IP adapter look awfully familiar to another control net model that we know if you're thinking about control net reference only you all right but based on my testing I found that these two models do not do the exact same things I found that IP adapter is good for replicating the style of the reference image and you can apply that style to a new object or image composition think of it like an image generating Kakashi here is an example I use this reference image of a girl with rainbow hair from C AI apply the style to this spider gr image with image to image and adding control net open pose as a second control net unit and I was able to create this new image of a girl with rainbow hair but in the pose of the original Spider gwin image what IP adapter cannot do very well is to copy the face of the reference image exactly as we can see here I try to combine the reference image with a range of different secondary control net units and I was not able to replicate the exact face of the original rainbow haired girl here are examples using sd15 face plus model and the results are still not good in contrast control net reference only is a lot better at replicating faces here is the same reference image running through reference only pre-processor and adding a second control net unit with cany model it gave us an almost identical face as the original image keep in mind the original reference image was generated with an XD XL fine tune model and here we're using STD 1.5 fine tune model to replicate it which is absolutely amazing but on the other hand control net reference only cannot do what IP adapter does which is applying the style of the reference image to an entirely New Image or object take a look at these examples they do not look good at all even though I use the exact same workflow same parameters and just switching out IP adapter for reference only so here's the summary IP adapter is good at replicating Styles but not so great at replicating faces reference only while good at replicating faces is not very good at replicating the style and applying it to a new image in a way these two complement each other I guess what I'm trying to say is don't expect one model to do everything use the right to for the job that's it for today I hope you enjoyed this video and found it helpful I would appreciate it if you show your support by clicking on the like button and subscribe to this channel it will help me a lot thank you and I will see you in the next video
Info
Channel: Keyboard Alchemist
Views: 24,757
Rating: undefined out of 5
Keywords: stable diffusion, automatic 1111, stable diffusion tutorials, a1111, AI Art, AI, Tips and Tricks, Tutorials
Id: rOOhvZ-8Y0w
Channel Id: undefined
Length: 17min 46sec (1066 seconds)
Published: Thu Oct 26 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.