StableDiffusion Deep Dive - HiRes Fix - How to avoid twinning and losing composition

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello I'm silicon thamaturgy and welcome to my deep dive series ironically generating good images in stable diffusion seems to be more of an art than a science there's a lot of conflicting information floating around the community about what settings to use this inspired me to design experiments to isolate and test the impact of various settings today we're going to cover the high-res fix this feature got overhauled in early January some users are reporting the update actually broke the feature but it works fine with me so I'm going to roll with it if you've ever tried to use a non-default resolution for stable effusion you've probably seen some weird stuff twins triplets or cronenberg's fractals in the end it just kind of sucks to make an aesthetic looking stuff outside of a narrow range sure you can upscale afterwards but even those Advanced new upscalers can only take you so far before things start to look bad and you can't get that extra little detail to make a good looking high resolution image one way to do it is high-res fix not only that it's the only way currently unless you're good at scripting to do it in a single step first let me give you an overview of how high-respects works initially it does an image generation like any other image generation you typically want this to be as close to the default resolution of the model usually 512 by 512 to limit the issues with generating images at high resolution like 20 and fractalization next it upscales the image with the selected upscaler then adds noise based on the denoising slider finally it performs another image generation with the upscale annoyed image for high-res fix there are a lot more variables compared to the Samplers and subject matter I tested in my last video so my methodology was a bit more jumbled I mean rigorous and extensive compared to last time first I did some initial runs mostly testing denoising the upscalers and the Samplers to get a feel for how these worked with different kinds of subject matter this allowed me to figure out which variables in the process were important however before we go into the important ones we're going over which ones are not important so you can stop worrying about them let's start with steps and high-res steps basically since steps only impact the pre-upscale image its impact is going to be based on the sampler you use either you can get better or different results from non-converging Samplers we're not really doing anything beyond 30 for converging Samplers so Shameless plug if you want to know more about Samplers definitely take a look at my deep dive video on Samplers and it'll tell you what you're converging or not and so much more on the other hand I would recommend keeping high-res steps low at 30 regardless of what kind of sampler you're using the image iteration from the high-res step takes a lot more computing power than the base image generation and you really don't see much in the way of change for that effort more hi-res steps appear to slightly reduce the amount of composition change caused by higher denoising levels but not by much and only in non-convergent samplers and this was so slight I'm not sure whether this is anything more than my subjective perception overall I don't think there's enough difference between 30 high-res steps and 150 high-res steps to be worth the processing time if you want to use Harrow steps as a variable you can use it to get slightly different versions of the same image for non-converging Samplers but that's about it the next thing that doesn't seem to matter much is the sampler itself I didn't see any issues with any of the five or six Samplers I tested regardless of subject matter just pick your favorite or favorites and let it do its thing now it's time for the good stuff I've identified four key variables for generating good images with high-res fix denoising the upscaler the amount of upscaling and surprisingly subject matter denoising is the first we'll cover in general as denoising increases the complexity of the file image begins to increase starting with small details and textures until you finally get to the 28 and fractalization that we really want to avoid while subject matter is well subjective I've found that single subjects like a face or portrait or animal are less prone to twin or fractalization than multiple subjects or complicated subjects if you do something like line art and manga that tends to fractalize a lot faster than single subjects so you really need to use low denoising or the complexity ramps up very quickly at the default 2x upscaling 20 and fractalization tends to emerge around 0.7 or 0.8 for most subject matter and at 0.5 or 0.6 for a complicated subject matter in line art however this is impacted by the amount of upscaling which we'll talk about later next we have upscalers honestly upscales are probably complicated enough to earn their own Deep dive but for the purposes of this video I'm just going to divide them into two groups latent and non-latant basically the difference is that latent upscalers work on the latent AK low resolution native image while non-lane upscalers work on the full resolution image created in the first step given that the latent upscalers scale up the low resolution latent image they look very blurry prior to denoising and reprocessing among the latent upscalers the anti-aliased ones tend to be slightly simpler than not an anti-aliased and the basic latent and latent anti-aliased seem to be the most different from the other four however all the latent upscalers had pretty similar results after dinoism was put back in they were all more similar to each other than the quote-unquote none upscaler since the latent image is a lower resolution than the fully generated image using a latent upscaler is effectively down scaling the image as a result they need a lot more denoise into yield a decent result I would say you can get lucky and get a decent looking image at 0.4 denoising but you're probably safer at 0.5 on the other hand we have the non-lane upscalers each of these has its own flavor which I'm not going to try to quantify here but they do impact the details of the image except for none which is essentially the true neutral finally the amount of upscaling you use is a critical factor in which settings will work I ran the same image sets at different levels of upscaling and you can see how much it impacts the output long story short the more you upskill the faster between infractionalization will become a problem at 2x upscaling things are pretty robust between fractalization starting around 0.7 or 0.8 for good subject matter at 3x it is 0.6 or 0.7 and at 4X it's even lower at 0.5 or 0.6 now we're going to put everything we learned together I'm going to start with what's probably my most controversial recommendation unless there's a key detail that I'm missing I would just not use the lean upscalers the downsides are severe by needing the denoise in at least 0.4 0.5 to get a decent looking image I don't see any upside compared to using the non-lane upscalers so yeah stick with whatever your favorite non-line upscaler is or whatever upscaler is supposed to be best for your subject matter there is no lower limit for denoising on the not Laden upscalers however as you decrease denoising eventually it's going to approach just having the image upscaled to get value from the hi-res fix you probably want to have denois and be at least 0.3 or 0.4 the amount of upscaling 2x seems to be pretty robust for most subject types however the higher the upscale the more limits you will be with the amount of denoising you can use before Things Fall Apart it's pretty hard to recommend doing Forex because even with subject matter that does well with high-risk fix you're stuck in a pretty narrow range so I would stick to 3x up skill and below and use a separate upscaling or imaged image if you need a bigger picture afterwards for denoising I would use between 0.4 and 0.7 for 2x for most subject matter and between 0.4 and 0.6 for 3x for a multi-subject matter or complicated subjects or line art stick to the lower end of this range or maybe even go down to 0.3 for line art and that concludes our Deep dive into the high-res fix I hope you found this video useful and enjoyable if you did please like And subscribe don't be afraid to leave a comment if there are any topics around AI image generation you'd really like to see a video ever thanks and goodbye
Info
Channel: SiliconThaumaturgy
Views: 33,548
Rating: undefined out of 5
Keywords:
Id: sre3bvNg2W0
Channel Id: undefined
Length: 8min 19sec (499 seconds)
Published: Wed Jan 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.