Complete Comfy UI SDXL 1.0 Guide Part 2 | Tweaks and Upscaling

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone and welcome back to the channel I'm really glad that the last video was helpful for so many people I got a huge amount of comments and subs from that last video and I'm really happy to see that it was helpful for a lot of you guys today we're going to focus on building and improving on last week's workflow as well as adding upscaling so that we can start to explore control net and more Advanced Techniques such as combining stable diffusion 1.5 models with sdxl in future videos I also want to clarify a few things that we did in the last video that maybe are not standard practice and at the end of the day it's about doing what works for you and what gives you the best output but it's important to know what the best recommended inputs are which is something we're also going to cover today before I get into it please don't forget to like And subscribe I also want to mention that even though the channel is small I have started a little patreon where I will start releasing videos a little bit earlier and have a few exclusive downloads and spreadsheets and whatnot this is mostly because it is starting to get a little expensive to make these videos I produce them on run pod and the research and production time is starting to end up costing quite a bit so any support that I can get towards offsetting those expenses would be really grateful I also want to start putting out videos a little bit more frequently once a week and being able to hire an edit would be really helpful with that as well however I don't want to exclude anyone and so I'll be making the lowest tier for anyone to get access to all of this as low as two dollars but if you want to give more and help support the channel there'll be other options as well so once again any support is appreciated so jumping into it some of the things that we're going to do and like I said just because these are the recommended nodes and techniques to use doesn't mean you have to use them but it is something that you should be aware of so one of the things that we're going to start to clean up here is we're actually going to change out our clip encoders for SD Excel specific ones so if we just move our nodes a little bit to start with the cleanup we can see here when we start looking through our available nodes that there is actually a clip text encode sdxl and sdxl refiner so we're going to go ahead and grab it and we can see here that the sdxl version of the clip has a little bit more information than the regular one so some of the new additions that we will see here in the sdxl clip text encoder is we now have a width height some cropping parameters and a Target width height as well as not one but two text input boxes one of the benefits of sdxl is it allows us to pass through more refined information not just in terms of the prompts that we're putting in but also the sizing parameters of the image and I'm not going to get into what manipulating the width and height do at this point but just be aware that there is a lot more that we can play around with the main thing that we're going to focus on right now is this new clip G and clip L and the way that you're supposed to use these is you're meant to put the subject of the image that you want to generate in clip G and in clip L you can put in additional information such as style information whether you want it to be a particular art style a photo graph you can put lighting information and so on however in practice what has been found is that combining mixing and adjusting the information that you put in either of the clips can have significantly drastic results and I'm happy to make a separate video jumping into the specific differences between clip G and clip L if that's something that you guys are interested in please leave a comment in the comment section below for today however we're just gonna have a very basic exploration of it so like before with the previous clip text and code we are just going to go ahead and convert our clip inputs into text now at the moment we've only got one positive text input so we're going to go ahead and clone this as well so that we've got two text inputs and we're going to go ahead and grab these and put them in we're just going to leave the clip L blank for now just make sure that you delete it so that we can start to explore the changes that this makes and of course let's not forget to attach the clip model and like before just for housekeeping purposes we're just going to color this green and now this is the positive conditioning so we're gonna put it over here and we can get rid of the old one and we're going to do the same thing for the negative one so we're going to clone this one grab the clip model clone our negative text input and just attach them like so oh sorry this is supposed to go to the bottom one and this is supposed to go to the top one and we're going to go ahead and get rid of that one and then let's add our negative conditioning now before we do anything since we've left the prompt input the same and everything else the same let's just go ahead and actually run this through the generator and we can see here this is the image that we got last time with the last setup and this is why this is the image that we got before with the last setup and we can already see here that just by changing the clip component we've already had a drastically different image and this is what I was referring to about the new clip G and clip L having a very drastic impact on the final output of the image but like I said let's finish fixing this up and we'll come back and play around with it now as we saw earlier we now also have a separate text encoder for the refiner and once again this is a more simplified version to the regular one so let's go ahead and glab grab the clip and convert our input to text make it green and pop in the conditioning now you'll notice I've purposefully not attached the text component yet and there's a reason for that in the old clip texting code because we were only dealing with one positive text box and one negative text box it was easy to pass over the information to the refiner clip encoder in this case though the new clip texting code sdxl refiner only has one text input but if we come back here we see that now we've got two text inputs for both positive and negative so the question is which one to use once again common practice is you typically will take your text box that is assigned to your L clip because that's the one that's going to be containing the style information and since the refiner is already receiving a almost finished image the subject details are less important and so what you want to reinforce are the style and detail components that the refiner is going to go in so typically if you just want to keep it simple if you just keep your style elements and your subject elements separate you can pass over the Style Elements and it will reinforce that however if you really want to dive into it we can also create a separate text box for the refiner and really just put in prompt details that focus on the finer details of the image for example if the character is going to have certain freckles or certain facial features that maybe are not going to appear in the image that the refiner receives because it's going to still be a little bit blurry however like I said for now we're just going to leave it as is and we're just going to send over the blank L clip so let's go ahead and delete the original one and then we're just going to clone this one over plug that into negative and then we're going to take the negative encoder and just doing a little bit of housekeeping I forgot to change these over to Red just so we can keep things nice and tidy okay so all we've done at this point is we've just added in the blank clip L text boxes and changed out the encoders for their sdxl variants and let's run it through and I've just realized the mistake that I made is the case Samplers are set on random so we're just going to switch them to fixed and go back to the seeds that we were using originally in that last the seeds that we were using originally in that last image the original image there we go so now let's run that through again and we can see a more realistic impact of what these changes are actually doing so interestingly even though we are still using the same seed as before we can see that the image is still somewhat different but still somewhat the same if we come back here and look at this original one right so this is what the original one looked like from the end of our last video and with the same seed and relatively the same settings just changing out the clip component this is what it looks like now and this is what I was referring to earlier about working with what works best for you I still personally prefer the output from the last video but part of this process is about learning how the tools work why they work a certain way and then adjusting it for what works best for you now as I said just because this isn't as good as what we had before it doesn't mean that we can't do a lot more with it so if we try here and make some adjustments and let's take some of the style phrasing out of the G clip and into the L clip we've got here detailed digital photography let's take those out and put them in here and remember this one down here is attached as well to the refiner now on the negative side this one's a little bit more challenging because all of our negative prompts are not really subject oriented they're actually all Style with the exception of maybe extra limbs and deformed but we've just made a small change to the L clip so we're going to leave it there as is so if we cue that up you can see here that the change is definitely a little more subtle one of the things I find frustrating about this change that we've made is the faces actually seem to be worse now if you've been struggling with faces in sdxl 1.0 as we are now one of the techniques I like to use is combining stable diffusion 1.5 with sdxl and if that's something that's interesting to you please leave a comment down below and it will be a video that I make in the future in the last video the model had decided to put our astronaut in a environment where it was surrounded by fall leaves and if we come back and look at the current one it looks like our character is back in space so let's see if we can bring those leaves back and behold by putting the leaves back in clip G we have found that not only has the astronaut come back down to earth and it's once again surrounded by leaves but the face is also improved quite substantially however for the purposes of this video let's continue and try and improve on this I also forgot to mention one thing I recorded this video after everything it was brought up to me in the last video and I did find this amongst my extended research that the preferred method when using the base model and refiner is to adopt an 80 20 Rule and what does that mean that means that you want to have the process completed 80 by the base model and 20 by the refiner however particularly in comfy UI I have found that that does not always produce the best results and I find oftentimes depending on the sampler that you're using having settings that aren't perfectly aligned can produce some pretty great results however for the purposes of Education we're just going to cover that briefly now so what that means by 80 20 is when you're creating an image we see here that we can define a number of steps what that means is you're telling the AI model that you wanted to follow a denoising process over 20 steps so it will look at the noise in the image and say okay I'm going to remove a certain amount every step to try and come up with the image now what we're doing when we Define a start step and end step is we're saying okay you know if you need to remove a hundred percent of the noise over 20 steps and I'm asking you to stop at 16 that means that at every step it's going to try and remove five percent of the noise and so at step 16 it will still have 20 of the noise left over and when that gets sent into the refiner model in theory you should also have your steps set at 20 because you're starting at that 80 percent denoise amount and you're just going to keep taking off five percent for the remaining four steps from 16 to 20. however I found that if you do that and we're going to do that now let's cue that up I found that the final output image is still relatively noisy and that's why I ended up actually letting the refiner go through a few more steps to help clean up the image but for those of you that are wondering that is usually a good place to start pick the number of steps that you want whether it's 20 24 50 and that is dependent on the sampler that you choose to use and if you want to understand the best settings for different Samplers please let me know in the comments below and I'll make another video on that but different Samplers have different Optimum steps that you want to work with start with that 80 20 Rule and if you're still not happy with the output start tweaking the steps in the refiner to fine-tune the image that you want and now let's get back to the main video so if we're quite happy with this image how do we go about upscaling it as the current version is still only 10 24 by 1024 pixels what if we want to use this image in an environment where we need it to be bigger we want to zoom in we want more clarity and more details well in automatic 11 11 there is a upscaling section and that's quite straightforward you pick the model and it makes the image bigger with comfy UI we're going to do essentially the same thing we're going to select the node that allows us to select a upscaling model we're going to pass the image through and then the output will be a larger image It's relatively straightforward so to do that so there's actually two ways that we can go about this and we're going to explore upscaling the image in both techniques the first one is by using the latent upscale node and there should be two of these by default with your comfy UI installation and essentially what this does as we discussed in the previous video the AI model uses what is called the latent format to understand the image that it's going to try and generate and so that image doesn't yet exist in a pixelized format and so with this we are going to try and attempt to make the latent image bigger before we decode it into a JPEG format now to be quite honest this is the least preferred method as I've found in my own experimentation that the results are not always great but I think it's important to explore it as part of the process so in here we have various methods by which the upscaling of the latent will be attempted and we're actually going to try and observe all of them now the main difference between these two nodes is that in this top one we can designate the pixel size output that we want in this case 512 by 512 and whether we are going to allow the image to be cropped whereas in this one we just indicate a factor by which we want to increase the image so if it's 10 24 by 1024 if we were to make it by a factor of 2 it would be 2048 by 2048 and so on and so on since for the model upscalers that we will use later we will be trying 4X let's compare apples to apples and let's go for a 4X increase I'm just going to use the bottom one for ease of use as we're going to have multiple nodes running at the same time and like I said we are going to try the different upscale methods to see what output we get and then of course we need to now decode this in case I haven't mentioned this before I don't think I have you'll notice here I have brought the way out and I'm using this reroute option here in the menu this allows us to create intermediary nodes just to help organize how the lines flow and everything so in this case I've just brought the way out with this rerouter and then from the rerouter I can just easily plug it into all of the decode nodes that we've got here and we're just going to pop this over here just so it's not in the way and then finally here for the output we're just going to drop in preview nodes because at this point I'm not interested in Saving the output okay and let's go ahead and run that and see oh of course I forgot to plug in the um the latent so just like before we're just gonna drop in a rerouter just to make it easier to plug in the latent and just so I don't have to keep dragging so far out without moving things around okay and now we're ready to go and here we go these are the results of the image being upscaled using the latent uh upscaler and you can see here that none of the results are particularly good and the reason for that even though we are upscaling a latent is the upscaling technique is very similar to traditional upscaling techniques in the sense that all you're doing is stretching out the image and then resampling what is already there to fill in the gaps rather than having the AI model understand the image and recreate it at the size that you want using AI latent techniques to go back in and replace the details whereas here all we're doing is just taking what's there and just multiplying it as we make it bigger and that's why it's not the preferred method for upscaling particularly when going in bigger numbers as we are now where we've done 4X it probably works out fine if you're doing smaller levels of upscaling maybe 1.2 1.5 but anything beyond that and you'll end up with the results of what we're seeing now so if we're not going to upscale using the latent then what is the preferred method for upscaling well first we're just going to go ahead and delete all of these and we're going to go ahead and grab the upscale model loader and image upscale with model nodes and what we're doing here is even though the input is an image is pixels we are going to use an AI model to look at that image and much like we've done the resizing here is it will resize the image and then go in and put in the details that are blurred out in the upscaling process so in this case instead of grabbing a latent we don't need the VA anymore we're going to go ahead and grab the image as an input and then get an output and now here we've actually got a few options so the standard model is the 4K Ultra sharp pth but we've also got here this 8K one so we're going to go ahead and try that one out as well and just like before we'll grab the image and now let's test out how these models perform and here we go you can already see at a glance that the image quality is significantly better than using the latent upscaler if we have a quick look let's go ahead and open these and we can see here already so this is the original 1024 by 1024 and if we try and zoom in this is as far as we get we look at the 4096 and try and zoom in we can come in a lot closer and we can really get in and start to see the details of the image and interestingly as we come in and start looking at the details we start to see a lot of things that aren't quite there yet certain things that are looking a little funny and odd and to be quite Frank I'm still not 100 happy with the face and if we come in here and look at the 8K one we can see here that we are definitely zoomed in a lot a lot more this is a significantly bigger image Now by having the capacity to come in this close we can start to see that the image is still not perfect while elements of it are excellent such as the hair here we can see that there are parts of it here where there's a little bit of almost static noise we can see here that there are some fuzzy lines and whatnot that haven't upscaled very well a lot of this is still a work in progress and I am still experimenting with models if you want to try some other upscaling models you can head over to comfy comfy no to hugging face where there are other 4X and 8x models that you can try unfortunately I was not able to download these in time to prepare this video but I will post up my results up on the patreon I mentioned earlier I hope this video was helpful we haven't covered too much more advanced stuff compared to what we did last time however I felt it was important to cover these topics as well as upscaling before diving into more advanced topics such as control net which we'll be covering in the next video some of you have also asked about how to create custom characters that you can reuse using comfy UI and that will be the next video that we cover after we start diving into control net

Info

Channel: Endangered AI

Views: 1,363

Rating: undefined out of 5

Keywords: AI, SDXL, Stable Diffusion, Comfy UI, Comfyui, Automatic 1111, generative ai, ai art, tutorial, ai tutorial

Id: 6UclQf6HE_I

Channel Id: undefined

Length: 23min 29sec (1409 seconds)

Published: Sun Sep 10 2023