ComfyUI: CosXL, CosXL Edit InstructPix2Pix (Workflow Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I am Mali welcome to the channel from Pitch blacks to Bright whites the cost XL model is an experimental take on the stable diffusion XL model by stability AI it offers Rich tonality good consistency and impressive overall details using the cost XL edit with instruct picks two picks you can create consistent variations and image edits with just prompting the edits are impressive and that too without the use of any control net or adapters you can also get creative edits with non- AI generated images like this actual photo today's tutorial focuses on the cost XL and cost XL edit workflows a unique upscaling method Advanced model merging and some of my hacks and tricks in comfy UI firstly I would like to thank all the paid channel members for supporting the channel also we just crossed 10,000 Subs so thank you for that c XL is a new model released by stability AI for research purposes only this model is tuned on the cosine continuous EDM predicted variance schedule it controls the amount of noise added or removed at each step of image generation using a continuous cosine function the benefit is that the model can capture a broader spectrum of color including true blacks and pure whites the second part of this model is C XL edit which is based on instruct pix two P now pix two P refers to a general class of deep learning models that can translate one type of image into another instruct implies that the Pix two pix model in C xcl edit can take instructions through text prompts this is not a basic tutorial I use some techniques that require comfy knoow along with control Net usage also note that there is a learning curve for getting good output results using cost XL and C XEL edit so please watch the tutorial as the Dual CFG guider K sampler and basic scheduler values have to be changed constantly depending on the prompt and desired output I will also showcase a new upscaling technique never shown before on the channel using ultimate SD upscale and the last part of the tutorial will focus on how to convert any checkpoint into a cost XL checkpoint via Advan merging the switch and integer nodes from the impact pack are used in the workflow control net is required for upscaling the workflow will also use latent and ultimate SD up scale some utility nodes from the comfor roll Studio are also required sdxl prompt Styler is needed for the cost XL edit workflow and I use the float node from comfy logic to edit values in the advanced merger workflow lastly many RG3 nodes are used for workflow optimization and ease of use please note that these are gated models and you need to fill in your details before access is granted you need to download the cost XL and cost XL edit models from hugging face they go into the models checkpoint folder I am also using the TT Planet tile control net model which is available from civit AI the page links will be in the description below I will start with the basic cost XL workflow and then move on from there add a load checkpoint and a self attention guidance node this node is now basic and should be included in all workflows it just gives better and more detailed outputs make the connection and add a Laura stack loader node connect the model and the clip since cos XL is similar to the base sdxl model you get better results connecting it with an sdxl clip conditioning convert both the text G and L widgets to inputs and connect them with a primitive node this will be the positive conditioning duplicate both the nodes for the negative make the clip connection for C XL it's advisable to connect with the custom sampler Advance rather than the standard case sampler I am going to drag out each input and connect the relevant nodes first the random noise this is the seed from the case sampler convert the noise seed widget to input connect it with the seed node from RG3 the Dual CFG is basically used for the instruct picks two picks in the C XL edit I am including it for C XL because it gives finer details the CFG conditioning value works like your normal CFG conditioning two and negative are both negative and changing this value in the cost XL edit has a drastic effect on the output however for the cost XL model if you keep this value at 0 or 100 there will be no change in the output for C XL you only need to change the CFG conditioning it works like your normal CFG later on I will show you a comparison of the output with single CFG and and dual CFG guider connect the Laura output model with the Dual CFG the positive to conditioning one and the negative to both the conditioning two and the negative inputs I am changing this value to three as that's the default for C XL edit we need to duplicate these nodes for the edit workflow again this value has zero effect on cost Exel text Generations add the case sampler select node I prefer to use the DPM Plus+ 2 MSD sampler which I will use for the tutorial the sigma is the basic scheduler and includes the number of steps as well for cost XL I will stick to Caris with 40 steps and the Deno value should be one connect the scheduler with the Laura model output for the latent input I will use a more convenient sdxl empty latent node from RG3 connect the sampler output to the standard vae decode node along with the preview and save this completes the single pass cost XL workflow put in a positive and negative prompt and let's hit the Q prompt mind you that the tone color range is very wide get creative with the prompts to get beautiful color tones for the base cost XL don't use too many negative embeddings put all these nodes under a single group called cost XL base arrange and color code the nodes as desired for the workflow we need this group to be on and off with a click for that add a fast mutter group node for from RG3 right click and go to properties rename the title change the color if needed and type in the group name cost XL base in the match title field we can disable and enable this group when required this is not part of the workflow but I want to show you why a dual CFG node was used for the cxl base I am going to duplicate the sampler and reconnect everything except the guider adding a compare node for comparing the outputs drag out the input and connect it with a CFG guer node make the connections and hit the Q prompt the image coming from the right is from the Dual CFG let's zoom in and you can see it's more refined you can even see the details and Clarity in the bokeh background to refine the image you can process it via a second sampler pass and while at it we can also do a 1.5x Laten up scale duplicate these nodes and put them above the base group drag out the advanced custom sampler output and add a latent upscale by node change the upscale method to bissler this slurp stands for slightly better latent upscaling with spherical linear interpolation basically it is an advanced version of slurp offering better interpolation and is better the nearest exact connect the latent upscale with the custom sampler here ensure that deno's value is less than one I have done a lot of testing on C XL and the merged real Vis XEL checkpoint a good range would be somewhere between 0.6 to 0.7 also note that the sampler selected should be the same for both passes typ Al keep the Schuler the same as well however you may get undesired results in cost Excel whenever the second pass changes the images a lot select exponential as a second choice for the sampler then go with sgm uniform or normal as a third choice how much the image changes is dependent on the prompt lastly ensure that the number of steps is the same as in the first pass organize these nodes in a group called cost XL refine the second pass is way better and upscaled as well but you can clearly see the changes the face is the same but the structure is slightly different it fixed the neck and the eyes play around with other schedulers to get some creative results before we connect this to the pixel up scale let's ensure that this group can be enabled and disabled as well rightclick the RG3 node we created earlier in the properties panel change the match tiles to include the new group C XL refine sort by should be custom alphabet and type in the following this allows us to use or skip the second pass the next step is adding and connecting the ultimate SD upscale with both outputs I am going to add a switch from the impact pack this will allow you to switch and choose the raw or refined output for up scale connect the base vae output first then connect the output from the refine vae to input 2 rename the node and the input as shown add an upscale using model node connect it with the switch and the upscale model loader node the 4X full hearty Remer does a pretty good job here I will leave the hugging face link in the description below this is going to do an imageo image upscale by 4X this won't be that good the point of doing this is that it will help add some minor pixels so when we put it through ultimate SD up scale it's still better than without this process since we used a 4X model we need to downscale the image back to the orig resolution so add an image upscale by node change the upscale method to lenos and the scale by value would be 0.25 this will downscale it by 4X now add the ultimate SD upscale node and connect it with the downscaled output connect the model as well ensure you are connecting with the Laura output now add a standard control net node here connect the positive conditioning to the control net ensure that you connect the non- upscaled image output with the control net now connect the control net loader and select version two of the tile control net model we only need the positive conditioning to pass through the control net connect the upscale model negative conditioning and the vae right click and convert the seed to input then connect it with the RG3 seed node convert the tile width and height to inputs as well this will then connect to an integer node from the impact packed rename the node to tile size I am using this on a 4090 with a pretty heft overall system anything lower than this resolution will reduce quality the higher the better this upscale is not optimized for Speed but instead designed for Quality so use this value only if you have a 24 gigabits vram or higher if low on vram stick to lower resolutions as per your GPU you can go as low as 512 I am going to upscale this at 4X if you want quicker Generations keep this value at 2x keep the CFG at 7 and change the sampler and scheduler as shown keep the denoising value at 0.3 we want the upscaler to add some details here is an image generated using a denoise value of 0.3 cos XL uses a wide range of color tones since we are using a tile based up scale sometimes this causes a color disconnect between the tiles resulting in color banding and oversaturation across the image you can avoid this by reducing the denoise to 0.10 however I would not recommend this unless you get this issue change the seam fix mode to half tile plus and intersections and let all the other settings remain default duplicate and connect the image upscale by node we want to reduce this 4X upscale by 2x for a sharper output change the value to 0.5 if you using a 2X upscale and ultimate SD upscale bypass this node organize and group these nodes rename the group to base up scale this will allow us to pass on anyone output to C XL edit duplicate the RG3 node and place it in a convenient location rename the node to cost XL upscale then go to the properties and make the appropriate changes to include only the upscale group so let's summarize the organization you first enable only the cost XL base and run the Q prompt then enable the refine group change the settings and hit the Q prompt again you can fine-tune the refine or proceed with the upscale enable the base upscale choose raw or refined and then hit the Q prompt note that at these settings the upscale takes about about 1,50 seconds on average or about 17 and 1 12 minutes as I said earlier the upscale is designed for quality not speed skipping the refining process using 2 xup scale and a smaller tile size will all reduce the time drastically this upscaling method is slow but stable it uses about 70% of the 24 GB VV RAM and around 50 to 60% of the 64 GB system Ram let me show you a couple more examples all these have gone through the latent refinement and then the SD up scale you can see the color tones and range this model can produce and the upscale quality in most of the images I can get these deep blacks if you are watching this on an OLED you will be completely Blown Away by the blacks and contrast and have a look at the brush details looks like a DSLR photo rather than AI generated these have zero post-processing and are exported at just the 8bit PNG Add a switch from the impact pack this gives the flexibility to choose the raw refined or upscale output image connect the switch with each of the outputs rename the node and the inputs as per the connections connect this with the image scale to Total pixels node change the upscale method to lenos this node will resize the image to 1 megapixels the reason for putting this is that the cost XL edit will work with your sdxl resolution only mostly 1024 so you may ask why connect the upscale even though the image would be the same you will get a different output because we upscale at a Deno value of 0.3 some new pixel data is always added secondly you may want a sharper image for C XL edit the switch takes care of the output from C XL base however for C XL edit you can also load a custom image add a load image node and connect it with another image scale to Total pixels now add another switch that connects to both these inputs rename the node and inputs as shown now copy and duplicate everything from the C XL base group change the checkpoint to cost Exel edit and delete the positive and negative primitive nodes now I will add a text prompt switcher from The Comfy roll Studio this will allow easy switching between a custom prompt and sdxl prompt Styler this will be the positive switch connect the string to both G and L inputs rename the switch inputs and the node as shown duplicate connect and rename to negative add the sdxl prompt styler and connect with the positive and negative now add a comfy roll text node and connect it with the Positive then duplicate it and connect it with the negative rename the noes title to indicate the positive and negative convert the positive and negative switch input selection widgets to inputs connect these with an integer node from the impact pack here the value can be one or two anything higher than two will be considered as two change the node title to reflect the same you can select one for the custom prompt and two for the sdxl prompt Styler add the instruct pcks to pick node between the conditioning and dual CFG guider connect the positive and the negative for the output connect the positive to conditioning one and the negative to conditioning two the negative should come directly from the clip text and code and not the instruct picks to picks output the latent will connect with the sampler latent image input connect the image input switch as shown also connect the ve delete both the preview images and add an image compare node connect the vae decode to image a and the image input switch node to image B color and organize the nodes into a grou group name this group cos XL edit we should add this group to the previously created C XL on and off node right click go to properties and add the group name and match title duplicate and place a copy at a convenient location below the cost XL edit group let me explain the parameters as dual CFG Works differently with the edit model first select the correct output image then the correct input image let's try a positive prompt I want to edit the image to show the character in gothic makeup with a Harley Quinn style face tattoo I won't use any negative prompts for this one you can even go with very simple prompts like if you want to change the lighting of a scene from day to night just put night in the positive or day in the negative ensure that the prompt switch value is correct the default dual CFG values are five for the conditioning and 1.5 for the negative they are correlated I'll explain with a live example the only good Samplers I recommend for the edit are ancestral ones so try the oiler dpm2 and the Plus+ 2s ancestral ones these are the most most effective with the edit model for the scheduler all of them can be used in combination for Creative results I recommend trying exponential and uniform as they also yield good results than the standard Caris keep the steps at 20 as it is way more sensitive with the edit model make increments at five to get very different results higher is not always better here denoise will always be at one let's try and see what we get at these settings the edit is extremely strong here if you increase the CFG conditioning the edit will get stronger if you reduce the value the edit becomes weaker and the image will be closer to the origin as per my testing for facial edits a value of 2 to three would suffice for other edits you can go as high as eight the conditioning two and negative value Works in Reverse increasing this value will weaken the edit and decreasing it will make it stronger and this is way more sensitive than conditioning let's reduce the value from 5 to 2.6 way way better let's have a closer look okay so this is amazing and believe me it's impossible without a cost XL edit I have tested so many edit generations with this model and the way it maintains consistency amazes me every time this level of editing takes days in Photoshop let's try the sdxl prompt styler with this same image you can of course put additional positive and negative prompts but I will leave it blank for now let's try a different value and scheduler I picked a face to showcase the level of consistency without knowing the proper settings you will get inconsistent results faces are hard so if the face changes remember to reduce the CFG value close to two now let's try a non-ai image this is a photo taken in Iceland switch the input for the custom image changing the prompt and this time I will put in a negative prompt as well ensure that the prompt switch is set on custom I will go with the default value since this is a landscape and will increase the steps to 25 you can see I have made quite a drastic change and even for a non-cost XL image it maintains consistency very well it completely replaced the background for the clouds and even extended the road latent upscale with C XL edit doesn't make any sense it's a hit or miss and the image changes a lot so it's not workable for non- aai images even for images generated viscos XL somehow the model does not perform well I only recomend recommend using the ultimate SD up scale copy all the nodes from the base up scale group except for the switch paste the nodes below the cost XL edit workflow right at the end connect the VA decode with the upscale by model node and the control net duplicate the compare node and connect image a with the newly copied image downscale output organize the upscale nodes in a group rename this group to edit upscale now go to the cost Excel update enable and disable node and add the edit upscale group this gives the flexibility to disable the edit up scale if not required with Advanced model merging we can convert any checkpoint to a cost XL checkpoint this means you will get all the fine-tuned properties of the thirdparty checkpoint with the color tone depth and range of cost XL add three load checkpoint nodes the first should be C XL the second would be Bas SD XL and the third would be your third party fine-tuned checkpoint real Vis XL in this case this is how it works we take C XL and subtract the sdxl base from it this leaves us with only cos XL without the base sdxl then we take real Vis XL and add it to the cost XL we will do this for the model and clip both the model requiring some extra nodes search and add the clip merge subtract node cost XL clip goes to clip one sdxl goes to clip 2 now add the clip merge ad node output from subtract goes to clip one and real V XL to clip two the same for the model add model merge subtract connect C XL to model one and sdxl to model 2 add model merge add and connect subtract to model one and real V XL to model 2 now add the model merge sdxl add with model one and real Vis with model 2 connect model merge sdxl with model sampling continuous EDM let's understand the crucial model merge sdxl it has input middle and output blocks blocks here refer to groups of layers in the neural network such as input middle or hidden and output layers the values here would be ratios in which each group of blocks is combined so technically you can have a ratio of 50% for both inputs 75% model 1 and 25% model 2 middle layers and mix 70% of model 1's output with 30% of model to's output layer in simple English the values of each block will make a specific model model influence the generation more or less let's understand what model one is and what model 2 is model one here is cost XL subtracted sdxl plus real Vis XL whereas model 2 is real Vis XL this is an advanced merge and not a simple merge so under no circumstances will you get purely model one or two output here as everything is mixed with the input middle and output blocks a value of one means the output is more influenced by model one and a value of zero means it is more influenced by model two a value of 0.5 means both are equally influencing the generation you can play around with each value and the number of combinations is too many to test some will break the generation and some will be stable to get the most stable output you only have to modify output blocks 0 to 4 for simplicity's sake I will assign a single value to all five blocks this means that both models one and two will have inputs but the output can be influenced equally or more by anyone rightclick and convert all five output blocks to inputs add a float node from comfy logic connect and rename it as shown I am going to add two switch any nodes one for the model and one for the clip the merged model and clip would be input one and the C XL would be input two for both switches after renaming convert both select values to inputs add an impact integer node and connect them the value can be one or two rename the node to clarify the same change the value of Sigma Max to 240 this gives you slightly detailed outputs I have tested and experimented with these values for the output block let's understand the formula model 1 into ratio plus model 2 into 1us the ratio so a value of one would mean model 1 influence and a value of 0.25 would mean 25% model 1 plus 75% model two output influence I suggest you take a screenshot of this for reference you can try out all five values and check which gives you the best output with your checkpoint I will be using a value of 0.5 for a balanced output add a checkpoint save node connect the model and click outputs as shown ensure that the vae connection is from the cxl checkpoint if you hit Q prompt this will create a folder called checkpoint in the output folder of comfy and save the merged checkpoint as a safe tensor file mute the node for now the workflow is set up in a way that you can run the generation and test the merge checkpoint once satisfied with the model merging enable this checkpoint save to save the safe tensor let's first generate using the base cost XL again now change the input to the merged model okay one thing to note here is that the sampler and scheduler should be more aligned toward the real Vis XL or whatever your fine tune checkpoint is for this specific checkpoint change the sampler to DPM plus plus 2m the steps should be around 25 to 35 reduce the CFG from 8 to 5 you can go as low as two but keep five as the default if the image is overburn or oversaturated reduce the CFG further these settings cannot be constant during testing I usually ended up manipulating the CFG values and steps to get the desired output so take note of that when upscaling there might be a slight color loss to fix this you need color match from KJ nodes add the node after the upscale output connect the input as the image reference and the output as the target change the method to hm mvg dhm and that's it this will ensure that there is no loss of color after upscale also make sure that you connect this node with the switch in the C XL edit group I hope this tutorial was helpful and you learned something new in comfy UI until next time [Music]
Info
Channel: ControlAltAI
Views: 3,053
Rating: undefined out of 5
Keywords: cosxl, cosxl edit, instructpix2pix, comfyui, comfy ui, comfyui stable diffusion, comfyui upscale, comfyui workshop, comfyui nodes, comfyui ultimate sd upscale, custom nodes, comfyui workflow, pix2pix, comfyui upscale latent, sdxl tile controlnet, comfy model merge, comfyui model merge, cosxl workflow, ultimate sd upscale stable diffusion, model merge, comfyui node workflow, ultimate sd upscale comfyui, comfyui controlnet
Id: mey08xsgybE
Channel Id: undefined
Length: 45min 53sec (2753 seconds)
Published: Sat May 04 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.