ComfyUI 101 - The Basics | Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys this video is really the basics of comy UI for someone who doesn't really understand it uh if you want to install com UI check out this video over here comfy UI allows you to generate these amazing images you see on screen here all for free using open source software if you'd like to install it you can check out this video up here so let's get into the basics right [Music] now so the first thing we're going to talk about is model files so these are big files you can download from hagging face or civid ai civid ai is a community of people that are basically pushing the boundaries with AI uh generative Ai and hugging face is more commercial websit that companies use to upload their models what do these large models actually contain you can see here they have these extension files and they're really big they on 6 gigs they contain weight for three different model types clip the main model and the vi in the comfy UI you can see here on the left ckpt name which is the model's name referenced by the checkpoint loader simple node what this checkpoint loader simple node does is allows you to load up your models as you can see here some more model [Music] files so let's start with the clip so the clip conect connects to a clip text encoder node this clip connects to a clip text encoder prompt and a clip text encoder prompt over here which are the symboled by these two yellow lines the clip is used in stable diffusion to encode the text to a format that the main model can understand another name for it is actually called a text encoder first one is referenced as the positive prompt and the second one is referenced as the negative prompt for example bad RS CGI airbrush so these are the prompts that you don't want in your image and this is what you want the cable diffusion to [Music] construct in stable diffusion the image is generated using what is referred to as a sampler this is fed into a k sampler this is represented by the sampler node and the sampler takes the main stable diffusion model as input so it takes the main model over here that you generated and takes it as input so these two conditioning nodes go into the positive and negative prompt of the K sampler so as mentioned takes the positive negative prompts encoded by the clip en [Music] coder the final input it takes is the latent image so in this instance the latent image is blank you can set the width and height and how many images you want to set so if you put this to 10 it'll generate 10 Imes Imes if you wanted to do place the face of this with another face you would put a image in here which can be taken as a Laten image as we're generating only a text to image we passing it an empty image that's why it's referenced as empty over there so the sampl takes this input latent image it adds noise to it so you can see there there's a d noise over here and it then Den noises it using the main model so it adds noise to it and then Den noises it as well so this D noise setting over here if you set it at a lower setting it comes more closer to the prompt doesn't allow the stable diffusion to generate its own sort of imagination with the image so encoded positive and negative prompts are passed to the model at each sampling step and are used to guard the D noising so obviously these two prompts over here are then used to guard this D noising that is done the gradual Den noising is how stable diffusion generates images so this gradual process is how it actually goes on and generates the image so the sampler actually then outputs a denoised image third and final model used by stable Fusion is the vi mode which decodes the generated noise image the vi is used to translate an image from the Laten space to the pixel space so Laten space is the format the main diffusion model understands while pixel space is the format that your image viewer understands so at this point it's at lates and space and then it goes to the vi decode it takes it and it converts it into the pixel space so that we can generate the final image for our image viewer so you can see here the vi decode node takes the latent image coming from the sampl as input and outputs a regular image it then saves it as a PNG file with the save image node so the save image node is then referenced over here this completes the overview of a basic text to image workflow if you wanted to generate something with comfy UI to download this Juggernaut Excel you just go and say download it's 6.6 gigs and if you wanted to generate this image over here you would just click on it you would see the prompts over here is this prompt so you can copy this prompt over here paste it in there and then you take the next negative prompt copy it paste it in there and then the resolution you can set that as well so this over here if you want to have an image similar to that you can set the seed so over here they provide you the seed you can set that there as fixed steps you can adjust this to how many steps it takes to do the den noising Etc DFG generally you can set it to be two or four or this this is cfgs reference by the guance the sampler as well as DPM Plus+ CDE which is D m++ CDE which is there and then I'm using the nois as one so generally when I produce this prompt then the sampler will go in take your input add noise to it converts to lat and space and then just add in whatever you want in the conditioning and that's the final image leave a like guys if you enjoyed that And subscribe for more videos just like this one
Info
Channel: AI Made Simple.
Views: 665
Rating: undefined out of 5
Keywords:
Id: 1K0MGCF5RM4
Channel Id: undefined
Length: 6min 26sec (386 seconds)
Published: Thu Jun 27 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.