Tiled Diffusion with Tiled VAE / Multidiffusion Upscaler, the Ultimate Image Upscaling Guide [A1111]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone keyboard Alchemist here and welcome back to another stable diffusion tutorial today we are going to talk about how to use the tille diffusion or multi- diffusion extension in automatic 1111 to upscale your images with this extension we can upscale an image from a measly 512 X 768 to 4K resolution or above and in the process add amazing detail to your image here are the topics that we will cover first we will go over how to install the multi- diffusion upscaler extension second we will walk through two methods using this extension to upscale images from 512 X 768 to 4096 X 6144 that is an 8 times resolution increase resulting in a 25 megapixel final image all running on my 8 GB 3060 TI GPU and of course you can always iterate this method to upscale your images even more lastly we will go through how to best set your parameters to get great results and which parameters you need to pay attention to and tweak to optimize this workflow without further Ado let's get started go to the extensions tab in your web UI then click on the available tab next click on the load from button search for tiled and the tiled diffusion and tiled vae extension will show up click on the install button on the right hand side to install it after the installation is complete go back to the installed Tab and click on apply and restart UI once your UI reloads scroll down to see both the tiled diffusion and tiled vae sections okay so let's start with a portrait image that I have already generated previously as we can see this image is small just 460x 768 pixels this is very tiny compared to an average cell phone photo where the resolution is at a minimum 30 24x 40 32 or 12 megapixels So the plan is to upscale the image three times each time increasing the size by 2X and gain some additional detail in the image along the way for the first upscaling step we are going to increase the image size to 920 by 1536 here since I am using an existing image I will upscale an image to image with a low denoising strength between 0.1 to 0.2 which will not change the image at all as we upscale If This Were a text to image generated image I would have used highres fix to do this first step which works basically the same as image to image latent upscaling I am using the 4X ultr sharp upscaler see the download link in the video descriptions but you can also substitute with one of the built-in up scalers I recommend using either reran or s I I will explain more about the upscaler as well as all of the other tiled diffusion parameters in Greater detail later notice here I am using a load Den noising strength of 0.2 also I am going to use addit tailor to fix up the face of this generated image if you want to dive deeper into how to use add Ailor I have a great video you can check out but for now let's press the generate button and get the upscaled image just a recap if you have an existing image you will do the first 2x upscaling step in image to image top half of the screen SC but if you are generating your image and text to image you should use high res fix bottom half of the screen using low denoising strength in both cases once the first step is complete I will click on the Send image and generation parameters to image to image tab button and use the now slightly larger image for our second upscaling step the second upscaling step is where we want to increase the amount of details within our image there are two different methods we can use and depending on the method we are using we will need different amounts of denoising strength quick clarification I am going to use this image of the girl with the black dress to illustrate the til diffusion only method while using this image of the girl with the red dress to illustrate the tiled plus noise inversion and control net tiled method for the til only method The Sweet Spot for denoising strength is somewhere around the 0.3 to 0.4 range a value that is too low does not give you the detail that you need and a value that is too high will introduce unwanted artifacts setting this denoising strength value should not be arbitrary I recommend using the XYZ plot script to see how the denoising strength values could affect the resulting image if you are not familiar with the XYZ plot script take a look at my previous video where I go through it in detail Link in the top right hand corner and in the descriptions so with this plot we can see that the denoising strength cannot be too high and that somewhere between 0.3 to 0.4 seems like a good spot where you introduce a lot of new details into the image but avoid the artifacts that makes the image look weird or overbaked if you compare between 0.4 and 0.5 0.5 looks like the girl is no longer looking at the camera and instead staring off into the distance here I have chosen 0.35 in order to use the tile diffusion extension you simply click on the enable tile diffusion checkbox for now you can simply follow the parameter settings that I have here I will go into more detail and explain each parameter in the later half of the video we are going to use the mixture of diffusers method set both latent tile width and latent tile height to 128 set latent tile overlap to 8 use the 4X Ultra sharp upscaler and leave the rest of the parameters in their default value further down the page we want to enable tiled vae this works together with tile diffusion to significantly lower your vram usage and enabling you to upscale images to much higher resolutions again leave the rest of the parameters in their default values once everything is set click the generate button once the image is upscaled let's just ADM the added details and Clarity for just a moment here okay moment's over let's get started with the third upscaling step in the third upscaling step we are going to increase the image from 1840 X 3072 to 3680 X 6144 assuming we have done everything correctly up to this point we should have an image that already has great detail so during this upscaling step we are going to dial down the denoising strength again to avoid overbaking the image remember to click on the Send image and generation parameters to image to image tab button then decrease the denoising strength to 0.15 you do not need to change any other parameters in the tiled diffusion or tiled vae sections with one exception I recommend you enable the fast encoder color fix option when doing this third upscaling step or upscaling the image to even higher resolutions you might get a final image with faded colors the higher resolution you go the more prominent this problem becomes here we see that the 3680 by 614 before image on the right is less vibrant than the smaller image on the left but enabling the fast encoder color fix option will fix this issue and give us the bright and vibrant final image that we are looking for in this example I did a 4th 2x upscaling which Amplified the color fading issue the image on the right is a 12288 x 6144 image without fast encoder color fix and the image on the left has this option enabled the difference is night and a with fast encoder color fix enabled let's generate the final image based on the the amount of vram you have and the size of the final image this upscaling step will take a bit longer with my 8 GB 3060 TI graphics card it takes about 4 to 5 minutes to complete which is not bad as we can see here I am not maxing out my vram only using 4.4 out of 8 GB but in case you want to upscale your image even more it will use up more vram and if you max out your vram and have to use shared memory then the generation will be much slower and here's the final result which is 3680 by 644 see how sharp and crisp the image has become we can clearly see that the details are fantastic you can zoom in all the way and will not see any pixelation now let's Circle back to the second upscaling step and take a look at method number two which is the tiled with noise inversion plus control net tiled method that's a mouthful so I will refer to it as the noise inversion plus control net method or ni I plus CN for short the benefit of using the noise inversion plus control net method within the second upscaling step is that this method can help add new details to your image whereas the tile diffusion and vae Method serves to enhance the existing details in your image only take a look at this comparison the image on the left is upscaled with the tile diffusion and vae only method and the one on the right is the noise inversion plus control net method we can see how much more texture is being added with the second method the woman's dress now looks like it is made from a leathery fabric the lines and folds on the dress are more defined the Shadows are ER and everything just has more definition and more substance to it even the background branches and leaves are more detailed by comparison the image upscaled with the first method looks flatter we can also compare some of the finer details when zoomed in the hair strands look better defined although I don't like the overly smooth out skin for the ni plus CN method this is a slight drawback of the noise inversion feature but with some tweaking of the parameters within noise inversion we can minimize this effect the upscaled image with the noise inversion plus control net method will make your image sharper and better defined just take a look at the comparison of the hands the one on the right is clearly better lastly we can also see details being added in the background here we see that the leaves and branches are much clearer for the image on the right even when there is bokeh effect this method handles it very well for the noise inversion plus control net tiled method you will actually want a higher denoising strength at least above 0.6 this might seem counterintuitive but that is how this method works it is because you are introducing a bunch of new details with control net tiled add a high Den noising strength which will make the image look overly noisy and messy but you will dial it back with the noise inversion method of multi- diffusion which produces overly tidy and retouched looking images these two processes work together to bring the image to an equilibrium state so to speak scroll down to the noise inversion section and click on the enable checkbox for now go ahead and use the same parameters as I have here on the screen this will provide you with good results I will go into more detail about what each parameter does later in the video scroll further down to the control net section here I'm going to assume that you already have control net installed and have the control net tile model loaded in the web UI but if you need a refresher you can check out my IP adapter control net video where I go into detail about how to install the control net extension and how to download and load a control net model into the web UI the instruction starts at the 45 seconds Mark in that other video I'll put a card here and a link in the video descriptions put your 1024x 1536 image from the first upscaling step into the reference image box click on the tile radio button to enable control net tile change the control mode to control net is more important and resize mode to just resize then click the generate button or Control Plus enter to start upscaling using the noise inversion plus control net tile method to upscale from 1024x 1536 to 2048x 3072 take about the same amount of time as the tile diffusion plus vae only method however if you want to upscale further let's say for the third upscaling step then there will be a significant jump in terms of vram usage and for me it will exceed my 8GB Max vram and thus will take a lot longer to generate this brings us back to my previous recommendation for the third upscaling step which is you should use the tile diffusion and vae only method for the third step of this workflow and not use the noise inversion plus control net tile method assuming you have gained sufficient detail in your image in the Second upscaling Step you will not get a higher quality final image using the noise inversion plus control net method and it will take you 30 minutes to an hour to complete an upscaling step that should only take about 5 minutes of course if you have a more powerful GPU this might not be a concern now just to recap the work flow we are upscaling the image three times total each time increasing the image size by 2x during the first upscale step we are making the image higher resolution without changing the image you can do this with highres fix or imageo image latent upscaling during the second upscaling step we want to introduce additional details to the image and upscaling at the same time you have two methods to do this the tiled only method and the noise inversion plus control net tiled method the tiled only method is faster but the noise inversion method can help you introduce additional details then the third upscaling step we are making the image higher resolution again without changing the image much by keeping the denoising strength low after this if you want to increase the size of the image even further you can just repeat the third step again but of course each iteration will take you a bit more time to do because there are more tiles to process in the end you will get a higher resolution image with more clarity and detailed than the original hey if you made it this far into the video I hope it is because you like the content and found the video helpful if that is the case don't forget to hit the like button and subscribe to support this channel your likes and subscriptions are going to allow me to continue making quality content for viewers like you thank you now we are going to dive a bit deeper into how each of the settings work within the tile diffusion or multi- diffusion extension starting with the tiled diffusion methods these are just two different tiling algorithms the mixture of diffusers method and the multi diffusion method based on my testing when doing upscaling there seems to be little to no significant differences in terms of image quality quality between the two algorithms in this image we can see that the Shadows are a bit different but that's about it it's very hard to notice if you weren't comparing the images by overlaying them like I'm doing here even though there was no significant visual difference from my testing the author of the tile diffusion extension recommends using mixture of diffusers because this method requires L tile overlap since it uses gossan smoothing using L tile overlap is advantageous because theoretically you run fewer tiles and can generate your image faster I think this probably only comes into play if you are working to upscale your image to very large resolutions now let's talk about the width and height and overlap parameters you want to choose the right tile size for your image a larger tile size will increase the speed because there are fewer tiles to process but the optimal size depends on the model or checkpoint according to the author of this extension since SD 1.4 and 1.5 were trained on 512 x 512 image this is like a lower boundary and 1280 x 1280 is an upper boundary since most checkpoints will not produce good images larger than 1280 x 1280 in the latent space these dimensions are divided by eight so your tile width and height should be between 64 to 160 from my testing I found that a 128 x 128 tile width and height works very well most of the time the size for the tile overlap is a bit arbitrary the author mentioned that the size of the overlap depends on which of the diffusion methods you are using multi- diffusion use 32 or 48 and mixture of diffusers use 16 or 32 I found that when using mixture of diffusers even eight overlap works well as long as you are not getting visible seams in your resulting image I think you can select the smallest value here that will do the job maybe try starting with eight and increase it if you need to also too much overlap will introduce noise and artifacts into your image for example here we have some random objects showing up in the background the eyes getting overly messy and things melting together when your image becomes larger as you upscale more and more at some point you would want to increase your tile width and height this doesn't come into play with a method that I showed you earlier in the video because the image is not large enough for the tile size to affect your resulting image but if you go one step larger you will want to use a larger tile size otherwise the resulting image will be blurry and the color will be desaturated another way to fix the color desaturation issue is by checking the fast encoder color fix option in tiled vae here is a quick optimization tip with the batch size parameter you might want to try bumping up the tile batch size to a higher number than four the higher the batch value the more tiles are processed together in one batch of course increasing the batch size will increase your vram usage so you want to find that sweet spot where you are close to maxing out your available vram without going over this will make your image upscale ing a bit faster but if you go over the vram limit and start to use shared Ram then your image generation will slow way down so you want to pick the right batch size value for me I can only do five tiles in a batch at one time where I was using about 6.6 GBS out of 8 GBS of vram now let's talk about the different UPS scalers there are quite a few ups scalers you can use there are some default ones and I have some downloaded ones installed let me show you how to download and install one of these ups scalers for example the remaki upscaler you can download it from open model DB I'll put a Link in the video description this website has tons of ups scalers that you can use each upscaler is good at something specific for example the remac upscaler is supposed to be good at restoring photos with lots of details in it after you download the upscaler file put the file into your stable diffusion automatic 1111 folder under models eer again then after you restart your web UI you should be able to see it in the drop- down list here let's compare the different ups scalers and the resulting images they provide first we have two interpolation methods Lanos and nearest these are your more traditional mathematical formulas that can be used to interpolate digital signals between its samples if we look at the resulting images we can see these two methods provide a more pixelated resulting image it's more apparent when you zoom in there are better UPS scalers we have available so we don't have to use these the ldsr upscaler provides good results but is too slow we can see that the resulting image quality is on par or maybe even better than the r San 4X Plus's upscaler however this method took over 9 minutes to generate the upscaled image while all other methods only took about 2 minutes so let's not use this one the Rees Aran 4ex plus upscaler is a good default upscaler we can see from the example images that this upscaler provided good quality upscaled images and did not take too long to run in fact a lot of the custom downloaded UPS scalers such as 4X Ultra sharp or 8xn mkd supercale UPS scalers which I will show you later are based on the RS Oran model the RS Oran 4 plus anime 6B upscaler is similar to RS Oran upscaler it's just more specialized for anime models as we can see the resulting images here are not much different from the RS San 48k plus upscaler I like the swin IR 4X upscaler it is a different family of model compared to the rsr Gan ups scalers but it provides good details and a clear resulting image so if the rsr Gan based upsc scalers are not giving you adequate results you should try swin i4x the last two default ups scalers are the skew net Gan and skew net psnr UPS scalers these two ups scalers are okay but the resulting images are more blurry than either the rsr or swin IR UPS scalers so I don't see a real need to use these now in this last set I wanted to show you four downloaded ups scalers that I use most often there is the 4X Ultra sharp upscaler this upscaler is very popular because it provides a very sharp detailed and clean looking resulting image it provide better details than the default rsen 4 plus upscaler so you really can just go straight to this upscaler and in most cases it will provide you with a great result now in some use cases you might not want to use the 4X Ultra sharp upscaler maybe you need to generate a more photorealistic image where you can see more pores on the skin or a more grainy texture in the image then maybe you can try one of these other three ups scalers for sax Nickelback FS 8X n mkd super scale or ITF skin diff detail these are all based on the RS Gan model so the resulting images won't be radically different but there are some subtle differences if we take a look at the zoom zoomed in images we can see that these three UPS scalers produce images that are more grainy than the 4X Ultra sharp upscaler there are definitely more visible pores and blemishes on the skin and for the ITF skin diff detail upscaler there's even a change in the skin tone to a warmer shade compared to the other UPS scalers needless to say when it comes to UPS scalers there is no one siiz fits-all upscaler although it's a safe bet to start with 4X Ultra sharp but if you need something different you also have other great options available to you now let's go through the tile noise inversion settings I didn't find a lot of documentation on how noise inversion works but my guess is this is taking the renoise kernel and doing convolution to the original image with the kernel which adds noise back into the original image this is why we see that the image gets noisier as the model goes through the rising steps the additional noise is what creates the additional detail at the end after the D noising steps are done because the AI model will take into account the added noise and make sense of it with the help of our control net reference image essentially turning the added noise into fine details and control net keeps everything coherent first let's look at the rising strength value Rising strength goes from 0 to two when we increase the rising strength there are some additional foreground details but we are losing a bit of coherence in the background for example the stone on the ground has a different pattern on the right side the earrings are a bit messed up and we lost some details in the background buildings and steel bars in most cases I would prefer zero or a very low value as opposed to the higher values but this is just my personal preference the ren noising steps parameter goes from 1 to 200 the more steps you use here the longer it will take to generate the upscaled image but it does add more detail to the image just look at the girls clothing and the background leaves but there is a trade-off because the higher Ren noising steps seem to make the image a bit more blurry and messy I think the middle ground value between 50 to 100 is good but depending on the image you are working on you might want to try a range of values to see what works well lastly the retouch parameter goes from 1 to 100 here we used 1 20 40 60 80 and 100 as test values retouch seems to be the equivalent of someone taking a smudge tool in Photoshop and blurring or smoothing out the whole image the higher the value the more blurry Things become most of the time it's okay to keep this value at the default of one but in some situations you might want to increase this number because it makes the image look less busy and messy keep in mind it will blur out some of the fine details a brief explanation about the tiled vae section normally the default tile sizes should work just fine you might have a different encoder tile size and decoder tile size than what I'm showing here because it is automatically set by the system depending on your graphics card but if you run into the Cuda out of memory error or if you max out your GPU memory and start using shared memory you might want to lower the two tile sizes last but not least let's talk about the control net tile settings there is really nothing that you would need to mess with in the control net settings since the default values work well but I wanted to test out the down sampling parameter to see what kind of effect it will have on the image I found that at higher down sampling values there tends to be some weird artifacts c s and 8 there seems to be a pair of eyes that spontaneously generated in this area for four five and six there are some artifacts in the area where the right hand and the cloth meet but it's not too noticeable unless you zoom in closer there are no noticeable differences between the lower values of 1 2 and three so I say you can leave this value at the default of one because you are not going to get any tangible benefits from increasing this value okay now that you know how all of these parameters will affect your resulting image you can tweak the parameters with the help of the XYZ plot script to your heart's content but if you are looking for a quick and easy set of parameters that will generally work well here are my suggestions you can also find these in the video descriptions all right I know this video is already pretty long but I can't help myself here is some bonus content sometimes after you have upscaled your image you may find a small area of the image that is not on par with the rest of your image in my example I found that the eyes were a little blurry and seem to have changed a bit after the upscaling in some cases you might even find the eyes are different colors so this is where we can do a quick in painting to adjust them first bring your final upscaled image into the in paint tab change the positive prompt to beautiful eyes and keep your negative prompts the same then draw a small mask over each eye notice here I'm just covering the eyes and the little bit of the surrounding area and that is good enough you don't want to make the masks too large here are the settings that I am using note I'm using the denoising strength of 0.5 which provides the right amount of change for the eyes after inpainting this value is a middle-of the road value that works well but you might want to adjust this value for your image as needed be careful not to make it too high as it will generate artifacts the tile diffusion and tiled vae settings are the same as before here you can experiment with using a few different ups scalers and see which one works the best once all your parameters are set hit the generate button I'm skipping ahead here to the final result since the generation does take a few minutes to complete here we can see that the inpainted eyes look more detailed and more Vivid compared to the original in my case I might want to reduce the denoising strength to around 0.4 to get rid of this artifact on the right eye okay that's all for today please hit the like And subscribe buttons to support this Channel and I will see in the next video
Info
Channel: Keyboard Alchemist
Views: 3,878
Rating: undefined out of 5
Keywords: stable diffusion, automatic 1111, stable diffusion tutorials, a1111, AI Art, AI, Tips and Tricks, Tutorials
Id: 44waH3sDYOM
Channel Id: undefined
Length: 24min 2sec (1442 seconds)
Published: Tue Feb 13 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.