Understanding ComfyUI Nodes: A Comprehensive Guide

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone welcome to this video on understanding confy wide nodes in today's video I will talk about the basic buil-in nodes from comy UI and I will explain how they work also talk a little bit about the technical side of a few of the nodes now since there are a lot of custom nodes I will not be able to cover all the custom nodes instead I'm just going to focus on the buildin if you have a specific specific custom node you want me to cover in the future video let me know in the comments down below let's get started with the most common node in my opinion and that is the checkpoint loader symbol you can find it by right clicking click on ADD node go into loaders and then select load checkpoint alternatively you can double click and search for checkpoint and then select the Lo checkpoint checkpoint lader symbol now this note is quite straightforward to understand it has a single input and that is in the form of a checkpoint name this checkpoints can be found under the config y folder and then go into models and then checkpoints you will see the checkpoints there if you do not have any you will paste your checkpoint in this folder now the main purpose of this particular node is to load the checkpoint or the models without any additional configuration now if you search for load checkpoint in this search you will see that before we had a node called loow checkpoint Luda and right now it's saying deprecated that because we no longer need it the simple version basically is replacing the other one and previously we had to find the model plus a configuration file but with this simple checkpoint it's no longer required it will automatically get the correct configuration now this checkpoint name is basically a string in code it's St Str so this means that we can right click on the Node and select convert checkpoint name to input this will give us an input field we can take this input field drag out select add node utils and primitive and now we have a primitive node which can be used to control the value of the checkpoint name in the low checkpoint so for example we can change the control of the generate to randomize and then each time we click on QPR it's going to randomize the checkpoint that we will use however this process will require loading and unloading the checkpoint multiple times so if you're experimenting you can use this method I would suggest to go with incremental select your first checkpoint and then then each time it will just go and change the checkpoint so go into the second third fourth and so on how many checkpoints you have okay now this checkpoint is one of the few checkpoints which takes one input but it has three different outputs the first one is model and most of the time it will go into a case amplug the second one is clip and this one goes into a clip text encode and then last one is VA in a simple text to image workflow this VA will go into a VA decode node which is usually after the case sampler so the order will be the first thing from clip it goes into a clip text and code this can be a positive or negative the model will go into the K sampl and then the vae will be the last thing in the pipeline inside the vad code so let's start with the clip the main purpose of the clip is to First Take the positive input and be negative they both work the same way and then it will do something that's called as tokenization so it will convert them into individual tokens once we have these tokens it will then change it into bunch of numbers these numbers basically is what the conditioning is so that's the process of the clip for example let's say you type in a woman this will get converted to one go it's going to depend on the checkpoint that you have but most of the time you will see this especially if you're using something like an anime like model and then if you have one girl that also get converted into the number one girl lady that also will get converted into one go so this is the token for and because this is still in text format and when it comes to machine learning computers generally they prefer numbers so this text get converted into numbers and the output is basically like a conditioning that conditioning will go to the case sample okay so that's basically the clip and this whole process basically this entire thing here it happens inside the clip text and code now let's take a step back let's go back into the model and the model is just a unit so if you're training your own model you may have a unit format and you can actually load a unit individually by double clicking and searching for unit you will have a unit loader now similarly the clip can also be loaded by itself there is a load clip here the V also has one where you can load the V individually so basically this load checkpoint is doing the job of the unit load the load clip as well as load VA all in one when you go online to download a particular checkpoint you're most likely downloading this save tenses or maybe a do ckpt and you can think of the save tensors orcd file as a container so you basically get a container like this and inside it there will be the unit there's the clip and then there is the VA the check point main purpose is to separate these three and then give you so that you can pass it to the pipeline now we've already talked about the clip text and code we've also look at the model let's take a look at the VA before that we may have to talk about latent diffusion model let's say we have a picture and this is in the pixel space or we still have pixel the pixel value normally is represented as 255 255 and let's just say zero here so this is going to be like R GB like so and what we see we we see an image representation but when we pass that over to a computer it will just get these numbers right if we are training a model using this format we give it a bunch of numbers and we expect it to Output the same thing what could happen is that this is the picture that we are expecting to get but the model is Genera in this this image here and at first CL it may look exactly the same so this is our expected result and this is what is being generated by the model and right now we are talking about the pixel space when we get a result like that we usually calculate a loss and then based on the loss we are going to tell the the model to improve but when we have these small changes and right now for those of you who did not see the change basically like the eye is at this position and this eye is slightly below so for us that's obviously not correct but for a model the loss would be quite low let's say it's 0.2 this is not enough to push the model to go towards a good direction to improve so what the stable diffusion team did was to go from a picture in pixel space pass that over to get a latent space and this conversion here is done by the VA encode node in config one and the way to think of it like that is at first we have 255 255 and zero in this case it will just convert it into numbers like this this is not an exact representation now note that I am using this as a representation the exact transformation from Pixel value to latent space representation will depend on the specific architecture and training of the model it is not a simple linear mapping the latent space representation typically encodes features or characteristic of the input image in a lower dimensional space and the purpose of this latent space is to capture essential information about the input image in a more Compact and structur shed form that facilitate image generation and for us these numbers do not make a lot of sense but for the computer it's actually better because now if we go from a latent space and we try to generate the same image this one this loss will increase and now we can tell the the model to improve so go towards a particular direction now of course if we go from a pixel space to latent space we'll need some way to go from a Laten space back to a pixel space and this is done by the vae decode node inside of confy want all right so now we know about the model the clip variational Auto encoder now in a text to image workflow we all generating the latent space image by using an empty latent image so this node would give us a latent space image but in an image to image workflow we are going to have load image and because this load image is just loading an image so we are still in the pixel space we use the VA encode note to convert that into the lat space and that is where we can drag the VA into this VA encode so that it there's the conversion now one thing to note here is that when we go from a pixel space to the latent space we are losing a little bit of information so it's important to know that if you have a complex workflow where you're going from image to latent and then you're decoding that image back to to the pixel space if you take that output image and then pass it over to let's say an upscaler and then you go back into another example for a second pass each time you are doing the encoding and decoding you're losing a little bit of information all right so now I have the clip texting code which is green that's going to be the positive and the negative one is going to be red I'm going to change it to some safeguard for YouTube all right now next let's go into the most complex one I think which is the case sample because this one as you can see it's taking a lot of inputs and just outputting one this is the most common Noe type that you will see where you have multiple inputs into a node and then we just get one single output now for this key sampler it requires the model which comes from the low checkpoint and then it requires a positive negative which is basically the conditioning from the clip text and code we're going to take the positive clip text and code into the positive node and then negative into negative for the latent image we can grab from the empty latent image or we can pass an image convert it into latent space and then pass it over to the key sampler now the key sampler is designed to do the sampling steps the basically the sampling operation so for now think of it like we have an image and this is just noise like a bunch of noise we don't have anything what the case sampler will do is at each step is going to den noise it slightly so that we start getting a pattern when an image so at this point maybe we get the head here then we start getting the hair and finally we have the actual image right so this process of going from a noisy image to a clear image that's the responsibility of the case not by the way if you're looking to to see the code how exactly this is happening Cas sample class just have a function called sample and then inside it it's using another function called common T sampler and if you want to see the code you will need to look for the function common Cas sample and see how that works so these are all the inputs that it requires and then then it's just outputting a latent so back to conf UI the latent will go into a VA decoder and we've already talk about this VA code what exactly it's doing it requires the samples is just latent and then VA is will get it from the model you can use the load VA node to load a specific VA if you have one and this one just going to convert from latent back into an image which we can view it by using a save node or a preview image node now let's talk about the seed think of the seed number as the starting point for generating pseudo random numbers basically if all other variables are the same and you use the same seed you will get the same result each time so the seed ensures reproducibility and it helps to get consistent results now in confy UI the seed has a minimum value of zero and a maximum value of this hexa decimal number in Python if we were to convert it from hexa decimal number to integer we are going to get this big number for the most part you should not be running out of seed number when generating image basically think of it like this you have this big number number here you are going to multiply this by the number of steps then you can multiply this by the number of key Samplers you have multiply the result by the number of noise scheduler multiply that by the positive prop basically how many numbers or tokens we have in that model multiply that by the negative prpt you're going to get a lot of images that you can generate from it now next let's talk about the step or inference step in stable diffusion steps or inference steps refer to the number of steps the diffusion process takes to generate samples and remember samples is just the intermediate latent images each step involves iterative refining the generated sample by adding noise and then denoising it this will of course depend on the cas sampler basically the algorithm that is being used during the sampling process now for cfj it's known as the classifier free guidance scale and this scale will determine whether we are going to follow the prompt this classifier free Guidance the cfj the higher the cfj the more the model will f Follow The Prompt or the input image and the less the CG the more freedom Randomness that the model we get in order to generate the image the sampler and the schedule these two are basically the algorithm used when going through this Stu and trying to get to the clear image here now some of the algorithm are converging so let me zoom out a little bit so basically if you start from this l image and no matter the number of steps you will always reach a similar output so something like this so even at this stage we'll get a hint of this one and if we continue on going if we have more steps here we're just continuing down the same thing we will keep having the same picture maybe if if the the steps count a little bit too high we may get noise at the end but the overall composition of the image will still be the same now if few of the algorithm or diversion that mean that let's say at this point here we are at step 20 but if we go to step 40 we'll get a completely different image so it basically start to diverge from the image so here's an example of a Cas sample which converge towards one image and right now I am using U D with step one I get this image and you can see something here if I keep on going step two three at step four we can see a cat that was my prompt by the way a cat we have a table and as I increase the number of steps we are getting the same cat in the same position just adding more detail to the overall image so at step 15 again is the same cat sitting in the same position we have the same table the element in the background just adding some more detail so if I increase the number of steps at 40 you can see it's again the same cat now I've changed the K sampler to Ula a a stands for ancestral and basically what's going to happen is that at each step we are going to add a little bit of noise during the sampling steps and this is an example of a a Cas sampler that's going to diverge and it it just cannot converge towards a single image the same thing we have the cat here and now you can see we have two cats at step three step four the cat is changing into a woman and right now what we are seeing is that the CFG is too low because of those added noise and by the way for the Ula one I was using the same CFG and it was able to give need the cat all the way until step 40 but for Ula a I've increased the CFG and you can see the image is slightly burn now but that's okay we are at step number one we go at step number three we now have a cat in step four we have a different cat in step five different cat and as I keep on going through the number steps you will see that the cat changes the environment as well changes es and keeps on going like this and this is something that you'll need to look into when you are downloading the model try to see which of the sampls are going to work best for that particular model now positive and negative this is just the conditioning the text that we are passing over to it a latent image that's just coming from the empty latent or from an input image which you can encode into a latent space okay so for the noise is scheduler it's basically determining how much noise there should be at each step and here's an example of one so let's make it as if this is Step number one and we have step number 20 here and then we have 10 somewhere in the middle so at step number one we have the highest amount of noise and then as it goes we decrease the number of noise through each step and the noise scheduler is the one responsible possible to determine how much noise there should be at each step and that is because the sampler will use this estimation to determine how much noise it should remove at each step and lastly we have D noise and the D noising factor is used in image to image workflow we basically have an image an input image which we pass to the vae encode that a latent space image from it and we pass that over to the K sampler now the den noising factor is telling us how much of that image do we want to keep and how much do we want to change so if we have a denoising factor of 1.0 it means that we do not want to keep anything from the original image and we want to change everything and if we have a densing factor of 0 to5 or 25% it means that we want to change 25% of the image and keep 75% of it so a low denoising factor means that we want to keep the image a high Den noising Factor we want to change that image the more we want to change it all right so we've taken a look at how stable diffusion Works what are the different NES for text to image as well as image to image Works let's talk about a couple of nodes which are built in compi the first one is we go from add node utils note and this one is simple you just add notes to yourself so what I usually do for this one is if I'm using sdxl models they have very specific resolution that you want to use there's something like uh I believe like this then 1024 by 1024 you can have these notes to yourself here it will stay in your workflow and anytime you need to refer toate can just check it okay these are the different resolution or maybe the different Cas sampler settings you can add all of that into the nodes we've already seen the Primitive primitive is basically just going to give you an input widget and it's going to also give you control so for example here we have steps at four we also have cfg1 let's say we want to see what is the best CFG for this model again convert this cfj to input take the cfj into my primitive node now I have control after each generation so control after generation here I can change this to incremental and let's say I want to start with let's say 0.1 I can click on the Q promp and you will see the moment I click on Q promp let me show my viq I have one it incremented it to 0.2 if I click on Q promp again it goes to 0.3 and it keeps on going so I can keep click on Q promp multiple times I can also use the extra option and do a batch count then it will go to increment here depending on how many I have you can just wait for it to complete look at the output images and see which one it's the best and select that one since we have the note here you can add a note let's say best cfj and then put your CFG again and the last one that is under utils is the reroute and this one is you have like big Works let's see you're going from here to somewhere here and uh let's bring another case sampler here so I want this model to go all the way here to this case sampler and you can see it's quite far it's a little bit difficult for me to grab this go over there so I can use this reroute node by taking the model into the route and now I have like a point and drag this over to where I need it and connect it through it that's BAS the rout node now on the Samplers you will see we have custom Samplers schedules Samplers and signals those are basically doing the same as the key sampler just additional control or if we have any experimental node that came out it would usually use a custom sampler if it has not been implemented in the normal sampler now we also have the cas sample advance and this one gives you a little bit more control over how the noise will could go from one Cas ampler to one another so for example let's say I'm doing two passs I'm going from this first case ampler to the second Cas amp I can choose to return with left over noise and then when I pass that over to the second one the second one will continue with whatever noise was remaining for that to work I will need to change the end at steps let's say for example here we are doing 20 steps and I can do the first 10 steps with this case sampler and the remaining 10 with the remaining noise we go over to the second Cas sound plug and then it's getting to continuous generation you will see this often in sdxl base workflows and basically we have two different checkpoints that we have the base SD XL and then we have the refiner checkpoint so the refiner we use the second key samp and then the base one will have the first case now of course we have other nodes like upscale latum and if you understand this basic workflow all the necessary component to generate text to image as well as image to image the rest is pretty self-explanatory so if go into latent we have upscale latent and it it's just going to take latent not this one from the key Al and then upscale it so right now we are 1024 and going to do twice so 20 48 and it's just going to double it so you need to go go from this upscaler to another key samp here in order to again go through this step here all right so this video is getting a little bit long I think this is where I'm going to stop it if you have a specific node or specific group of noes want me to cover in a future video let me know in the comments down below if you've watched until the very end and you've learned something new let me know by clicking on the like button subscribe to the channel if you have not done so and I will see you in the next one

Info

Channel: Code Crafters Corner

Views: 1,316

Rating: undefined out of 5

Keywords: comfyui, Code Crafters Corner, CodeCraftersCorner, ComfyUI, Stable Diffusion, Text-to-Image, Image-to-Image, Machine Learning, Tutorial, Checkpoint Loader, KSampler, Clip Text Encode, VAE Decode, Noise Scheduler, Workflow Optimization.

Id: nplW9vBCUxg

Channel Id: undefined

Length: 27min 29sec (1649 seconds)

Published: Tue Apr 09 2024