The Ultimate Guide to Master Comfy UI Controlnet: Part 1

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone and welcome back to the channel today we're going to start covering the much requested control net now this is going to be split into several videos to make it easier for everyone to find what they want today we're going to cover what is control net why you'd want to use it and how to use it in comfy UI the next video will cover the same but in automatic 1111 and eventually we'll do a deeper dive into how control net actually works in conditioning stable diffusion model mod to get the inputs that we desire I'm also happy to do additional videos on specific control net models please let me know if there's any you'd like me to dive deeper into in the comment section below but before we start for those of you who've heard about control net but don't have any idea as to what it actually does it is a series of models that can provide additional inputs to stable diffusion to provide more control over what the final image looks like and here on the screen I've just got a simple example of me taking a pose and creating a new image with that pose now depending on the model that you use you can provide different inputs such as poses as I've done here sketches line art or even depth maps and these will condition your stable diffusion model to provide an output that incorporates that input into the output to simplify what that means is that you can now create consistent elements in your stable diffusion images whether that's having your images have the same pose the same room layout or even simply maintaining certain characteristics of a character or creature in the image it can be very powerful and today we're going to dive deeper into how to make the most use out of it now before continuing I just wanted to remind you guys to please like And subscribe it really helps the channel out as we work our way towards 1 th000 Subs I also just wanted to mention my patreon I currently put out a video every 7 to 14 days depending on how busy I am I have have a ton more videos and ideas I'd like to dive into and I'd really like to bring on board an Editor to help as this is my biggest bottleneck the funds also help cover the cost of making the video on runp pod and if you really want to support the Channel please check it out now diving into control net we've already discussed how depending on the control net model you use you can influence your stable diffusion output differently so let's go through each of the models and understand what they do and what their inputs are so the first so the the first model we're going to look at is the control net canny model and as it says here the cany model is a mon or what the cany model looks like looks at is a monochrome image with white edges on a black background and we can see here in enlarge what the KY image looks like and we can see here the background is black and the outline of the subject is drawn out here in these lines with details drawn in to represent the Feathering pattern and when we plug that into control net along with our text prompt this is the output that we get and we can see here that the output looks almost exactly like the input with color and detail added in the next model is what's known as a depth model now control net has now different variations of this this is the regular depth model there's also one called the Zoe depth which is a more detailed version of this model it gives you more control including allowing you to represent different depths in different colors so that you can specify even what the distances from the camera in meters but for the purposes of this video we just need to understand how a basic depth model works and we can see here again the background is black but there's different areas which are shaded in Gray and you've got the subject here outlined in white and different Shades of Gray which tells us that this is the subject of the image the shape of the subject and where the different elements are we can see here for example the neck area is a little bit darker which represents it being slightly further away from the camera and we can see here this is what the output looks like the overall shape is the same the camera understands that the background is further away which is why it's blurred out and based on the text prompt which unfortunately we don't have it's able to understand that this is a person and where the different facial features go the next model is the the head model and this one like the canny works by detecting soft edges on a black background now interestingly let's pull it up now interestingly this input this image input has a lot less detail than the canny one and almost looks more like a sketch we can see here the outline of the subject is displayed there are some details that are drawn on the inside of the bird they're kind of almost shaded in I would say it's not quite a sketch but it is like a soft representation of the bird and the output is this one and again the performance is very similar to the county one there's probably less detail input here by the person and more by the model because we didn't come in here and draw these details in like we did in the cany model and just for comparison this is what the cany output looks like and again we don't have the text prompts with us to see what was changed in terms of the text prompt whether the AI here decided to put more color or Vibrance or whether it was determined by the user next we have again another line based cany this one similar sorry now we have another line based input and like the cany this these are harsh lines on a black background this almost looks like it's something that's drawn in paint and this could probably be a very useful model for interior designers where you could sketch out the details of a room just using straight edges straight lines and then you get an output like this based on the prompt it would be interesting to see how this Compares with a cany if you want me to do some comparisons in terms of creating environments with the county model and with the head model please let me know in the comment section below essentially they're both edge-based models taking edges and lines and turning an image out of that I'm just going to skip over this one um because I honestly haven't seen this model used much in my research now this is the open pose model and this is probably the one that I've seen used the most commonly and what the open pose model does is it takes an input which is this image of a stick figure and you'll notice that each of the elements of the stick figure are a different color and so the model understands each color to represent a different body part now there's different ways that you can create this pose I believe that blender has the capacity to Output open pose models you can draw them on Photoshop and essentially when the model takes this in it understands the different body parts and create an output that represents that so we can see here this is not the best output but all of the body parts are in the right place relative to the model we can see here that the arm comes down the arm comes down the shoulder ERS are here across the shoulders are across the other arm comes around here which you know is a little deformed that could do with a little bit of tweaking but we can see here the shoulder the arm the elbow and the Hand comes up and then these are where the facial components are now interestingly I'm just going to dive into the open pose page and we can see here another example of one of jovic jovic I think that is jovic and a more detailed version of the model and we can see here that you can actually even Define details in the hands in the open pose and even the face is drawn out in more detail allowing you to specify even facial expressions and we can see here how in this oh so I think sorry this is not the open pose output this is the input so they've taken this image created a open pose pose out of it and then given a output from the model and you can see that again it's not the best output but all of the pl pieces are in the right place coming back we've now got here the scribble model and this one is really useful again for people who aren't artists this is actually a really well-drawn scribble but I've seen examples of situations where people have just drawn out basic shapes told the prompt what they want and then the model is able to figure out what the scribbles mean what the prompt is trying to get the scribbles to mean and then give you an output and again we can see here the general shape of the person follows what we've outlined in the scribble the final model is the seg one and this one is also really interesting similar to the depth map it's got different objects in it but instead of representing depth each color represents a different object or type of object and we can see here how for example let's look at the the final image we've got this bookshelf here the orange Parts represent things on the Shelf here we've got two sections of gray which represent the wall blue is the chair green is the table purple is the sofa white is the window and we can see all of those elements represented here in the final image now before continuing on it is important to note that the control net models come in different formats and for different stable diffusion models so the original Creator IL IL asvel has released a series of control net models but all of their models are for stable diffusion one 1.5 it's also important to note that the control net models issued by ilas VL also come with stable diffusion 1.5 already bundled into the model meaning that you won't be able to use it with your own checkpoint instead what you want to do is you want to actually look for the models that contain only the control net component which are much smaller in size so it will be easier on your hard disk space and you can then pair it up with whatever stable diffusion checkpoint you want on top of that there are two different versions now of the control net models the ones trained on 1.5 and the ones trained on sdxl not all of the control net models have been shifted over to sdxl yet and they have not been created by the original Creator instead they've been created by different organizations I'm sure that we will get more control net models from ilas at some point but for now we are using models for sdxl created by Third parties in my case I'm using the models created by defusers for for cany depth Zoey by tbod for open pose and then 10cent Arc also has another Cy model so now that we've gone through the models and we understand what each of them do the next question that comes up is how do we get the input images for some of them like for example the canny one this seems like something overly complex and something that may be difficult for people to create especially if you're not an artist and even if you are an artist the time and effort it would take to draw this out and put in all of those details you may as well just continue drawing it out and creating the image yourself so how do we get an input image from this well one of the most common ways to use control net and regardless of the input that you're having is a lot of times we use a reference image so let's in this case find the reference image for us to use we're going to do a dog so let's grab here just a dog here we go perfect this is a great one and what you do is you feed that image through something called a preprocessor and what the pre-processor does is it takes the source image and then it will spit out an output based on the parameters that you give it now I am using some custom nodes and here is the one that I used to get it I've included a link to the images in the description below so if you want to download them and follow along please go ahead they're also available on the patreon so we're going to grab here the cany so we've actually got two I believe this one comes with comfy UI and then there's one which we downloaded from the custom noes that we brought in earlier so we're actually going to try them both and see how they perform now because we're using sdxl just remember to set your resolution to 1024 and we're going to grab our doggy image now we're going to go ahead and feed it into both cannies and before we do anything we're just going to feed them into a preview just to see what we get so we can see here both cany nodes gave us a image output that we could use later and we can see here that you've got all of those details drawn in probably enough to make a face so we're going to use the one from the custom node because it's just giv us a nice starting point and we're going to try and turn this dog into a bear cuz I can definitely see some elements here that look like they could work for a bear so we feed this now so now to use this control net input we need to do we need to add in a couple of important nodes which are here one is the load control net model so we can find it here load oops we type in control net we see here control net loader and that allows us to use the model that is going to interpret the input in this case we had open pose and we want to use a canny one which I do not have here so if we head back over to diffusers we know that they have a control net Camy model so let's open it up head on over to files and they have here save tensor version so we're just going to grab the regular one and copy the link address now if you've got this installed on your computer or whether you're using runp pod like me you want to head into your models folder and where it says control net this is where we want to drop the model so in my case I'm just going to open up the terminal make sure I'm in the right folder and grab a w get with the correct model okay and now that we've got the model downloaded we can go ahead and refresh so it did download it with the same name as what I had earlier so we're just going to rename that to canny to Safe 10 canny. safe tensors and refresh and we've got it here so let's feed the image in and let's change our positive prompt to bear looking at camera smiling and let's queue it up and while not perfect you can see here oh let's try and enlarge this that the basic element ments of the canny are there right we can see the positioning of the nose the position positioning of the mouth the location of the eyes all of those details are there and interestingly because we've only got the snout we have enough blank space or negative space for the model to fill in the rest of the shape of the bear so now let's look at open post since this is the other one that is most commonly used now here's one that I had prepared earlier where we've got this model again I got this off the internet where you know she's standing here against the background we want to take this pose and make something similar let's say we like it and we want to use it for something else so like before we've got the pre-processor here in this case it's open pose pose recognition I almost forgot to mention if you do want to use this custom node make sure that you have the node manager installed then go in and search for the author who created it fovel 116 and then it's The Comfy UI control net auxiliary preprocessors if you try and look for it with the node name it may not show up this is what happened to me we've loaded the open pose XL2 safe tensor uh model we've applied a strength here so I didn't get into this earlier so earlier we loaded the control net model I for completely forgot to mention we also need to apply the control net so we've got here a node that does that specifically and what's happening is we take the text input and plug that into the apply control net node we then take the image generated by the control net which before was the cany now it's going to be the open pose we feed that in we take the control net model and then it combines all of that information to give you a conditioning which you then feed into the positive we are using this to feed the model and tell it what we want you could also do this in Reverse where we could feed in a control net model with a canny and a prompt as a negative input telling the model what we don't want so for example we have the same image and there's elements from the image that keep coming into the final image that we want to remove we could create a input model that simplifies it and simply looks at the elements we don't want let's say for example we've got a background in the moon right and through the edge detection that Moon keeps coming up and we just want to get rid of it we could recreate that same image rub everything out except for the moon turn that Moon into a negative input and in theory that would remove it if that's something you'd like to see me experiment with and do a separate video on please let me know and we can do an entire video experimenting with negative control net inputs so once again here with the apply control net we can also set a strength meaning how much impact do we want the image control net to have on the the final output and we're going to queue up the prompt here and we can see here more or less everything is in the right place now I am only using at this point the sdxl base model and so the image quality is not the best but we more or less have here the elements that we want you can see here the arm comes down in the right way the arm is here correctly if we look at the prompt we can see here blonde woman holding sunglasses in her hand smiling at the camera I mean it's all technically there but we do have some issues with deformation so we're just going to clean this up with the refiner model and see if we can get it cleaner again I do have a video plan coming up on doing hand and face replacement to improve the output of that as well okay and we're back so what we've done is as we've done in previous videos we have EXT extracted the descriptions from the clip into text boxes we have created the refiner section here where we've added in those same uh clip text encoders connected them up we replace the case sampler for the advanced one set up the steps again if you want to know what is going on here and the details of how the case sampler works and why we have made these changes please check out my previous videos and we've kept these steps here at 20 with turn leftover noise in the K sampler ending at step 17 and I personally usually like to read add a little extra noise in the refiner one meaning that we need to add in a couple of extra steps start at step 17 we've got the refiner model loaded in and let's see what the polished version looks like ah right we're not feeding in the latent image and the model so let's try that again so we can see here the finished image coming out of the refiner there's definitely more work to be done here the hands I'm still not happy with but it does look a little more polished than it does before I think the seed changed which is why the character has a different outfit it still kind of has this airbrush look to it so I'll be posting this image on the description section below if you want me to continue working on it and refining it uh I will be posting additional refinements for this in the patreon and I think I'm going to end the video here because we could just keep going on and on about how to use control net I hope this gives you guys a basic understanding of how control net Works how to set it up in comfy if there's anything that you want me to dive deeper into on control net please let me know

Info

Channel: Endangered AI

Views: 11,144

Rating: undefined out of 5

Keywords: controlnet stable diffusion, controlnet tutorial, controlnet 1.1, controlnet comfy ui, comfyui, comfy ui, stable diffusion, stable diffusion comfy ui, generative ai, ai, ai art, ai art tutorial, ai art generator

Id: fyCb64Yx2vc

Channel Id: undefined

Length: 21min 50sec (1310 seconds)

Published: Sun Oct 08 2023