ComfyUI: Scaling-UP Image Restoration, SUPIR (Workflow Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi I am Seth welcome to the channel this is a 445 by 445 resolution image using a series of techniques I could restore the image in steps with high likeliness once I get the desired details I do a quick imageo image upscale to 12,000 pixels besides restoration I use the workflow to enhance and upscale AI generated images maintaining consistency if I zoom in on the eyes and the lip area you can see the enhancement more clearly this image is upscaled approximately 9x let me show you my unique technique the workflow build and some hacks I use in comfy UI a special thank you to the paid channel members for their support and a big shout out to kajai for the super implementation in comfy I will leave the GitHub link for the node and the original repository for super please check them out for the appropriate license usage super also known as scaling up image restoration uses generative priors generative priors are powerful machine learning diffusion models trained on vast data sets of images they learn the statistical patterns of what real images should look like a distinct feature of super is that instead of embodying the generative prior it leverages a pre-trained generative prior as a guiding tool during the image restoration process during restoration it iteratively refines A's image by injecting noise it then reverses the noise injection process the guidance from the diffusion model prior helps ensure the restored image is realistic and detailed so basically you can use super to restore old photos and improve the quality and details of AI photos it is a bit resource intensive as the comfy implementation is just a wrapper and remember it is a work in progress kiji frequently updates the custom node any changes or improvements will be posted in the pinned comments to get the most out of super I recommend a minimum of 12 GB V RAM and 32 GB system Ram the tutorial however focuses on Core Concepts that are useful for super and other comfy UI workflows in general this is an advanced tutorial that requires prior knowledge of comfy UI I will keep it as simple as possible but knowing the basics is necessary let's start off with the nodes I used for the workflow I use the integer node from the impact pack python Goss is used for some math expressions and the ultimate SD upscale helps with the image to image upscaling comfy math nodes are crucial for some calculations for the image resolution tile size and tile stride RG3 nodes help with the seed workflow optimization and organization the essential custom nodes are used to get and resize the image resolution and the color match node is from KJ nodes comfy super is the primary node here I also use Gemini to get image descriptions you need to download the super models from hugging face these go in the models checkpoint folder within comfy UI first let's build the basic workflow and understand how super Works to start add the load checkpoint I'll use the real V XL checkpoint here but you can use any sdxl checkpoint based on the image Juggernaut works very well well to add the super model loader then the first stage denoiser super encode conditioner sampler and decode nodes the color match node maintains the original colors and finally the compar now let's connect them connect the checkpoint with the model loader and add a load image node the image will first go through the denoise stage the model vae connects with the denoiser and then the encode the leaden output from the denoiser to the conditioner connect the model as well and the encode to the sampler model also connects with the sampler the positive and negative the latent and vae with the decode node connect the output to image a and the original to B for the color match the output from the decode will be the image Target and the original will be the reference I will explain some Basics first there are two models for super one is the Q model and the other is f finetuned i would only use the Q model for the tutorial which works best on all images the fine-tuned model performs better on some images and worse on others due to the inconsistency I prefer not to cherry-pick the examples super uses a unit likee architecture as part of its restoration process enabling fp8 and instructs the model to use 8bit floating Point Precision for the weights and calculations within this unit component fp8 requires nearly half the memory of fp16 precision by memory I mean vram enabling this setting can be crucial if you are low on V Ram please note that there is a slight decrease in accuracy compared to fp16 or fp32 for the tutorial my system will run the model on 16bit floating Point Precision the test system I am using a 64 GB of system RAM and 24 GB of vram diffusion data attack floating Point Precision is best left to Auto and this setting should be set the same across the first stage and the encode nodes as well fp16 is half precision and has a smaller memory footprint than full Precision fp32 on my 4090 it does the job well I got plenty of crashes with fp32 with a 2K High tile size upscale bf16 is called brain floating point the key difference is that it sacrifices some Precision in the fraction compared to fp16 for a wider exponent range mitigating some limitations of fp16 reduced range in other words it is close to fp32 accuracy in many scenarios while maintaining the same memory footprint as fp16 bf16 is only supported on the latest Hardware I would still leave the setting in Auto for now tile vae or variational autoencoder is a process where the image is divided into smaller overlapping tiles each tile is then processed independently via the vae this is very memory effective while there might be slight boundary artifacts the trade-off in terms of memory usage and scalability is significant I recommend enabling tiled vae for all super nodes the encoder and decoder tile size is the size of the tile generated for the vae encoding and decoding process although smaller has less impact on the vram going too small is counterintuitive for the system Ram I will explain this in a bit later in the tutorial I will teach you how to have three tile settings using math nodes which would dynamically change the tile size depending on the image size and upscale Factor the conditioner is the positive and negative conditioning used to guide the model sometimes it is recommended to use the default prompt but sometimes you can add some keywords to the positive prompts this plays a critical role in the output and I will explain later on with examples of how to use this correctly to your advantage at the time of making this video the captions were not fully implemented for now it just takes the text string and adds it to the positive prompt however according to the dev will eventually be able to use it to caption each tile automatically seed plays a very important role here since super injects noise for upscaling and enhancement randomizing the seed will give different results right click and convert the seed to input then connect it with the RG3 seed node I am using a constant seed to maintain consistency throughout the tutorial the number of steps should be according to the checkpoint used for real Vis XL 25 steps are appropriate you can also use the Juggernaut lighting version with just 10 steps CFG start and end is the guidance value at the beginning and end of the steps it should also be as per the checkpoint used you can experiment with a higher start and lower end value and vice versa if the details are too strong you have used a very high CFG I prefer that both values be the same so I would use seven here EDM stands for error diffusion model super cleverly incorpor op Ates an error diffusion model alongside its primary diffusion based restoration process the EDM sampling scheduler dictates the iterative workflow for error manipulation each iteration involves controlled modification and redistribution of quantization Errors across the image this process aims to achieve a visually smoother and more refined output than relying solely on the diffusion Model S turn is a hyperparameter a higher value means aggressive error manipulation but a higher risk of introducing artifacts a lower value indicates a more gradual and controlled error diffusion process in other words use a lower value for consistency and lack of variation I would be using the value of one for the entire tutorial s noise is another hyperparameter it controls the amount of noise introduced during the edm's diffusion process a higher value introduces more noise and a lower value introduces less noise this will impact the clarity and smoothness of the output I will reduce this value to a minimum of 1.01 similarly the ETA value here acts as a scaling factor for the noise schedule with one being the default a lower value means less Randomness and a higher value will give a more creative output for higher consistency I would choose a value between 0.1 to 0.5 super has its own powerful control net the start and end values indicate the strength of the control net applied for balanced consistency I recommend both values at one the lower you go the more creative the output you can try values of 0.9 for the start and 0.95 for the end or something like starting with one and a lower value for the end these give somewhat consistent but fine-tuned results and are image dependent for the tutorial I would stick to a value of one with my current settings I could find no difference between one and the max value of 20 I would leave the restore CFG value at one for now as the name suggests keeping the model loaded keeps it in memory this comes in handy when you want to retry the generation using a different seed or just do different images with the same settings however I will disable this as the workflow has some image to image up scaling after the Super nodes disabling it frees up the vram as soon as it's done generating the output there are four Samplers DPM plus plus 2m is the best however EDM works better at higher than 2x upscale the tiled version means the sampling is applied in a tiled manner tiling helps with the load and can result in undesired outputs if not done correctly I have already explained what tile size is the sampler tile size has the most effect on vram the tile stride is used to determine if you want the tiles to overlap or not I will explain this with an example for now let's keep it at the default value super denys the image and builds on it if you add a preview node at the stage one output you can see where super builds up from as you can see the image has fewer details but maintains many details from the original everything is getting added to this denoised image so it already has a lot of details tails to start with now let's understand tile size and stride in the sampling process to visualize it add a super tiles node with a preview the size here is directly correlated to the size of the image this image is 1024 in width and height at a tile size of 1024 you can see only a single image which the sampler will process this means there is no stitching or overlapping assuming that the upscaling factor is 1x the best quality is achieved when the tile size is the same resolution as the output meaning if I am upscaling at 2x then a 2048 tile size would be better than a 1024 tile size however a higher tile size is very vram intensive assuming the output is 1x this is what happens when we have the tile stride and size a 256 tile stride on a 500 12 resolution tile size means that the tiles overlap by 50% remember this 50% overlap it's very important super works by perfectly blending in the overlapping images without causing a seam a 75% overlap also works but I got the most success rates using a 50% ratio see what happens when the tile size and stride are the same value there is no overlap and if you process it this way you will get a clear-cut seam in the joints this is why overlapping is important another thing about optimization is a higher tile size uses more V Ram however a higher number of images stresses out the system Ram you have to get a balance again this is all related to the original image size and the desired output size this is important because you may say just let's reduce both these values even further to save more vram but look what happens when I do that now it will process 49 images instead of nine this is counterintuitive this will not only reduce the quality due to over tiling but require very high system Ram you cannot take any image and upscale it to any resolution first the resolution should be divisible by eight for comfy second Super requires the number to be divisible by 32 to complicate things some numbers are not divisible by 8 and 32 both images generated via stable diffusion should be mostly fine but let's take an image off the internet this one is 375 by 500 running it through the default tile size and sampler it goes fine however if I change it to a tille sampler or change the tile size and strike to a custom resolution it will error out for example let's say you take a 1024 image and want to upscale it to 1512 comfy will process it however super will give an error manually calculating the image size and the upscaling resolution that is compatible with both is cumbersome to autocorrect the resolution we first need to get the loaded image resolution add a get image size node a math expression can be used to check if the image is divisible by 32 or not connect the width to a from the B input drag out and search and add the impact integer node change the value to 32 now convert the expression to an input and connect it with a primitive node to check divisibility we need to divide the width by 32 so here the expression would be A/B after Q prompt you will get a number here if the number is a round number without decimals it means the width is divisible by 32 since I got a decimal number the loaded image is not divisible by 32 replicate the math expression for the height and do the connection connect the width and height with a display integer node this will show you the actual resolution of the loaded image using an image resize node we can resize the resolution which is divisible by 32 this node is only used to get the proper divisible resolution not to actually resize the image convert the width and the height to inputs and make the connections select Lanos as interpolation in the multiple of change the value to 32 duplicate the display integer nodes and connect them with the width and height from the imag resize the original size is 445 whereas the correct divisible size is 46 crop the original using an image crop node convert the width and height to input again and connect it with the image resize node I will keep organizing the nodes in groups as it gets messy organize the nodes further with appropriate titles to change the title right click on the dot and rename it similarly you can change the input or output title as well let's add the math nodes for the upscale Factor duplicate the image resize and the get image size node change the multiple of value back to zero add an integer binary operation math node convert A and B to input then duplicate and connect it with the height change the operation to multiply connect width and height to a input's shown duplicate the impact integer node and connect it with the B inputs this will be the upscale Factor add the display integer nodes to the imagery size now now you can just change one value by 2x 3x Etc and multiply the resolution by that factor organize these nodes into a group called upscale factor using the same technique create a group of nodes for downscaling and the binary operation nodes instead of multiple use divide from the original get image size node connect the width and the height to the a inputs for the downscale group nodes group and organize these nodes as well okay so we have four groups here this group gives you the resolution we connect that to a node which converts the resolution to a number divisible by 32 if it is not divisible by 32 it will resize and crop if it is is divisible then the crop will be the same resolution over here we have the downscale nodes this is useful for loading and downscaling any image the nodes here are called the upscale Factor nodes by upscale Factor it does not necessarily mean it has to upscale the upscale Factor can be one so for example when downscaling we will change the upscale factor to one what I will now do is first connect the downscale and the original Crop nodes via switch and pass them as the source of the get image size in the upscale Factor group this way you can switch to take the original image or downscaled image let's do that and test it then I will add a third option for a custom resolution add an impact switch any node connect the image resize output from the Downs scale to input one and the crop output from the Down group to input two for clear rename this to select an input and rename the inputs as shown connect this switch to the get image size from the upscale group ensure that the crop image output connects with the image resize downscale node input now you can use this node to downscale an image by selecting one or use the original image by selecting two to add the custom resolution option duplicate the image resize node from the bottom group ensure that the key proportion is set to True Connect the image crop output to its image input duplicate the impact integer node again rename it to custom resolution put a value say 1500 and connect it to the width and height organize the two nodes into a group connect this output to the switch input 3 rename the input slot to custom for the custom resolution image resize node ensure the multiple of value is set to 32 what this will do is whatever custom resolution you put say 1500 for example it will take the nearest number to 1500 which is divisible by 8 and 32 and push that resolution to the upscale Factor nodes a note when you downscale or use a custom resolution ensure the up scale value is one unless intended to connect the math nodes we need to finish the tiling nodes add the super tiles nodes just before the Super nodes group The upscale Factor image output connects with the super tiles input group the super tile nodes now you should replace the image connection with the super nodes first connect the image resize output from the upscale factor to the denoiser then connect it with the image reference in color match and image B input in compare right click and convert the encoder and decoder tile size to input the same for the encode node the decode node and the sampler as well the tile size connects with all the tile size inputs we just converted the tile stride output connects only to the tile stride input in the sampler now right click and convert the tille size and stride to inputs I will add three options for automatically calculating the tile size and stride this will make the workflow completely Dynamic and you can change tile values with a click the first set would be called Full this will calculate the sizes according to the upscale Factor this will be the most vram extensive but will give the best quality output add an integer binary operation convert only a to input and and change the operation to division rename the output to tile size duplicate the node and this one's output would be tile stride connect the width from the upscale Factor output to both nodes add and connect the display integer node the tile size divisible value here would be one we will calculate all the sizes based on the image width so if the image is 1024 and the upscale factor is 2x the tile size would be 2048 the tile stride will always be 50% of the tile size for overlapping so the divisible factor for this node would be two the second set would be called standard here the tile size would always be 50% of the upscale Factor the tile size divisible would be two and the St divisible would be four the third group would be the default values here it will give a constant value irrespective of the upscale Factor add two impact integer nodes the default values are 1024 for the tile size and 500 to 12 for the tile stride add two any switch nodes from RG3 one for the tile size and the other for the tile stride connect them with super tiles what this switch will do is pass on only active inputs so only one of the three tile groups should be enabled at any given time now connect the tile group outputs as shown over here add a fast group muta node from RG3 we can configure this node to enable an on andof switch for the tile groups right click and go to the properties panel change the toggle restriction to Max one this ensures that only one is active at any given time under match title start with the circumflex symbol followed by the group's exact name separate multiple groups with a vertical bar symbol end the string with a DOT followed by an asterisk symbol after turning off all the groups once you can only enable one group at any time you may want to easily switch between Samplers I am going to split the workflow there is a good reason for splitting this will enable you to disable the upscaling process this way you can check the image resolution and decide on the upscale factor and the tiling option duplicate the super sampler four times and group each one of them separately ensure a different sampler is selected for each one name the group as the chosen sampler group The decode color match and compare nodes name the group a super upscale decode as shown previously add another any switch RG3 node connect the samplerate Laten output from the four nodes to it then connect the switch to Super decode now duplicate the fast mutter node and change the match title string to only include the sampler group names only one sampler should be enabled at any given time test if it works correctly okay so this group is the super upscale and this one is the super upscale decode we will add another RG mutter group and add these two groups only this allows us to stop the upscaling process process completely and enable it after selecting the tile and upscale Factor since you must turn both nodes on and off change the toble Restriction back to default you can duplicate and put this node at convenient locations for easy time the Gemini Pro node does an excellent job of describing the image in detail add the non API version of the Gemini Pro node switch to provision connect it with the load image and add the display text node change the prpt to describe this image and that is about it you don't need to use such long descriptions Gemini Pro requires a pro account and API key it may give an error for portrait images as Gemini currently blocks that however it does a better job than chat GPT or any other llm model for non-humanoid images you can completely skip this but for some images a description helps for super use short and precise prompts unless intended the link to get the API will be in the description go to the custom nodes folder Gemini here edit the config.js file and enter your API save and close drag the nodes and place them below the super tiles group add a group and a fast muta node from RG3 and keep the node disabled unless required let's summarize we can load any image check the resolution and see if it is divisible by 32 or not the image resize node is used to get the correct resolution divisible by 32 the image crop node crops the image if this resolution differs from the original if the resolution is the same nothing will change the image crop connects to the downscale factor custom resolution and upscale Factor groups use the downscale factor to downscale the resolution and use the custom resolution to define the upscale at a specific resolution here is the select input switch node where you can choose either downscale custom or for the original image the upscale by value will multiply the given input resolution by that value when loading a new image turn off proceed with upscale then check the fixed resolution and decide on the tile and sampler options the upscale Factor will display the final resolution based on the previous inputs the width and height output from the image resize node determine the full and Standard Tile Size the full tile size is exactly the same size as the upscale Factor while the standard is half the size of the upscale resolution the default is a fixed 1024 resolution regardless of the upscale Factor at any given point only one of the tile groups should be enabled as it connects to an auto switch super tile node is used to visualize the tile size more importantly the tile size and stride values are passed on through this node to the entire super nodes further on gimac Pro is used to get an image description if required you can then use the custom keywords or phrases in the super conditioning as needed by default Gemini Pro is turned off Gemini blocks images of people and may give an error a custom prompt to guide super should be entered into the conditioning node using the RG3 muta node you can enable the final two groups and proceed with the upscale the workflow has an imageo image upscale group pending and is still not fully complete let's start with the live examples and techniques this is an extremely low-resolution image restoration is challenging as we don't know what the actual photo looks like however here are some tips and tricks for restoring an image like this with high accuracy and consistency step one you need to First add some details once a minimal level of detail is established then upscaling is easy the upscale factor is at two and I will use the original image I am not going to add anything to The Prompt let's see what happens the AI clearly needs some guidance this is because the image is very low resolution let's add something to The Prompt black and white photograph of a young police officer he is wearing a police hat and a white shirt better but we can improve the resemblance let me explain this face is checkpoint dependent since we are upscaling at 2x the original image does not have enough details to maintain accuracy in this case you may want to see if you can add details at 1X only without the upscale if that works we can proceed with fixing that way way better right click and copy the image select the load image and press the paste keyboard shortcut now let's try to upscale this by 2x at this stage since the image has enough details we can simply use the prompt and just put man because the AI tends to go towards a feminine generation if unspecified you can see how much the noise has reduced let's do the same process again while doing so I am keeping an eye on the tile size typically a 2K tile size works on 24 GBS vram anything higher crashes there is no exact rule regarding how many times you should do this basically do it until you get the desired results the full tile size is very high now I will switch to standard the standard will still be better than the default at this stage okay I still get a memory error this is mostly because of the resolution the upscale resolution is at 3K so lowering the tile size to default would most likely work for this resolution yep that worked the details are impressive the face is very close to whatever is recognizable in the Lowes photo the noise is far less as well this is generated in sdxl however the face is blurry let's first see the results of the two 2 x upscale at full tile size I am going to include a prompt here I am specifically looking to get green eyes also just to note if you are wondering about the sampler settings I am using the same settings as shown in the early part of the video no change there the settings I showcased for the sampler in the beginning maintain high accuracy and consistency I have tried them on a variety of image Styles and the results are impressive once the image has this level of detail and 2K resolution you can do a pure imageo image upscale in less than 30 seconds from the checkpoint add empty conditioning now add the ultimate SD upscale node connect the positive negative vae and model convert seed to input and connect it with the existing RG3 seed node connect the image output and add the upscale model I prefer PR to use the 4X full Hardy remaker model here ensure the following settings carefully upscale by at four steps CFG same as super sampler sampler and schedule as per the checkpoint very important change the mode Type To None this will ensure a pure image to image resolution ution upscale without any details rename the node to 4X upscale and update the match tile settings for the proceed to upscale node for easy on and off you can see this is quick and it took about 21 seconds and we have a very high resolution image I recommend upscaling for X only as we use the 4X model it's more optimized that way now let's check out the results when I do a 3X upscale instead of a 2X upscale the DPM plus plus 2m sampler doesn't work here very well at this scale I switch to Standard Tile size the details are overdone here except for the eyes everything else is overdone we can fix this by switching ing the sampler to EDM a more balanced result this is one sh shot case where I prefer EDM sampling during upscale in the last example let's look at one way to mix some techniques to upscale another low resolution image here I will input a custom resolution of 1500 and change the upscale factor to one since now I have a baseline detailed image let's try a new technique same as before copy and paste and load image now let's downscale the image by 2X and upscale by 3x also change the sampler to tile DPM when using tile it is recommended to go with the default tile setting as standard or full may cause some errors due to the resolution also I will remove the prompt and just keep it is photograph highquality detailed detailed prompting is not required at this stage some artifacts are normal the original image has a lot of motion blur however you can see the quality of improvements we can run this one more time tiled sampling is preferred here due to the resolution running it again at 2x down scale and 3X upscale this process takes a lot of system Ram with this technique I am at nearly 80% system RAM usage on a 64 gbit system and there you have it you can now send this to 4X image to image upscale or down scale back to 1024 and 2x upscale again I hope this tutorial was helpful and that you learned something new in comfy UI today until next [Music] [Music] time
Info
Channel: ControlAltAI
Views: 15,751
Rating: undefined out of 5
Keywords: stable diffusion, comfyui, supir, comfyui upscale workflow, comfyui photo restoration, supir vs magnific, comfyui upscale, custom nodes, stable diffusion tutorial, upscale for deepfakes, supir upscale, comfyui image to image, stable diffusion img2img, kijai, comfy upscaler, comfyui ultimate sd upscale, magnific, supir for deepfakes, upscale tutorial, img2img, upscaling
Id: WDGtHhE6sPY
Channel Id: undefined
Length: 46min 57sec (2817 seconds)
Published: Mon Apr 01 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.