ControlNet Deep Dive - OpenPose - What it can detect and output plus weight and guidance settings

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
it's 10 a.m March 5th 2023 after your morning coffee you think hey I haven't checked to see what's new with stable diffusion in a while I bet they have some cool new stuff so you go to your social media of choice ad oh my god did they rename civil diffusion as control net or something because literally everything here is about control net but when you go to try the shiny new toy for yourself you realize there are whopping 10 preprocessors and eight models to choose from which is downright intimidating that's where I come in I'm silicon thamaturgy and today we're going to Deep dive into control net so you can make the best use of this amazing tool and control the composition of your AI art today's video will cover open pose while open pose is in my opinion the most intuitive among the preprocessors there is still a lot to cover and I'm going to give you in-depth testing results and figures that you've come to expect from me first let's start with the basics when you use the open pose preprocessor it looks for human figures in the image you feed it then generates a skeleton consisting of specifically colored dots and lines the central dot connecting everything is located between the shoulders above it small branching lines show the location of the eyes and ears below the first line segment for each limb marks the boundaries of the Torso which are the hips and shoulders the next two line segments are for the arms and legs with the outermost dot representing a wrist or ankle unfortunately the version of open pose used in control net does not incorporate hand or foot positioning which is one of its weaknesses so if you want precise hand position in your image you'll have to use multi-controllment to get it open pose also does not convey any information about the background or any non-human objects which can be either a blessing or curse depending on what you're trying to do for this you can either try to use your prompt spec by these things in the image or use multi-controllenet if you're looking for something very specific naturally it wasn't enough for me just to say the open pose makes skeletons based on human figures I mean what's the fun in letting something work when you can push it to limit and see where it breaks first let's talk about the most common case partial human figures when open pose attempts to identify a human it identifies it based on the points not the lines it starts from the central point between the shoulders and works its way out if a point is outside the frame open pose will guess where the next point is however any further points attached to that one will not be detected regardless of whether they're in the frame or not for example if the elbow is outside the frame then the wrist will not be detected even if it's within the picture since all other points are connected to the point between the shoulders open pose will not detect anything if the subject's shoulders are not in the picture next I wanted to figure out how small the figure open pose could pick up but instead I decided to take a walk through a tranquil Meadow just kidding open pose continued to pick up the figure but with some integrated accuracy until a figure was shrunk to 220 pixels or a little bit over 20 of the Total Image height I would guess that the 20 of the image height is more important than the number of pixels here I was curious if openpose would have trouble with toddlers and babies because their heads are much larger relative to their body and limbs compared to adults realistic depictions of both Toddlers and babies worked fine so you should have no problem detecting people of any age with open pose but what if you want to use open pose on your cat sir fluffy kittens well unfortunately all the animals are tested including dogs cats and monkeys could not be detected by open pose so you'll have to use another model for your pet interestingly open pose could also not detect blocky humanoid robots on the other hand demi-humans such as mermaids harpies and minotaurs work well though occasionally they lost the end of their limbs during detection for centaurs only the front set of legs was detected and it did not match the actual horse's legs so it looks like it assumed that the body was actually the leg instead finally I tested a variety of cartoon styles to see what worked anime generally worked pretty well and some Futurama characters were detected also everything less realistic like Powerpuff Girls and SpongeBob did not work weight along with guidance are the two critical perimeters for control net they both determine how much influence the base image has on the output but in different ways guidance is what portion of the steps for the image generation at the controlla image used for them even when guidance is set to zero there is still some impact on the final image compared to an image generation without control net I interpret this to mean that control net will always be used for at least one step in contrast weight is how strongly the map is used when generating the image if weight is set to zero you'll get an image that is identical to the generation that would occur if control net was not used like all control net models adherits the map in this case the open pose skeleton is dependent on the particular values of weight and guidance setting lower values gives the AI more flexibility to modify the pose which can make it look more natural in some cases if weight and guidance are too low you won't get that particular pose you're looking for fortunately compared to other preprocessors going too high for weight and guidance and open pose isn't that damaging to image quality compared to say heed or candy but can still Force the subject into an awkward pose unfortunately the correct values for weight and guidance will depend upon the pose itself more unusual poses like this yoga pose require higher weight and guidance to get correct more common poses make it easier for stable fuses to actually portray the pose if the particular pose you want has one or more limbs in an unusual position you are going to need to keep weight and guidance High to ensure you get that result now that you understand what weight and guidance do let's talk specifics about the ranges for these variables that get the best results first let's talk about guidance when stable Fusion generates images it takes random noise and gradually transforms it into an image we can appreciate during this process the major details are established first and then minor details are added later because the major details of the image are submitted early on during image generation lowering guidance has little effect until you get below 0.5 which is 15 Steps for these examples at this point some of the more knowledgeable listeners might be asking what about ancestral Samplers like Euler a that add noise back in during image generation wouldn't that increase the impact of guidance that was my initial thought as well however at 30 steps the impact of guidance for Euler a versus DPM plus plus 2m wasn't noticeable then I thought maybe if it was 150 steps instead it would make a difference but no even at that high of steps there is no substantial difference interestingly the newest sampler unipc is not impacted by Guidance at all which is probably because it converges the fastest among all the Samplers in as few as eight steps due to having little no impact for most of its range I would not recommend using guidance as your primary variable for controlling control net which takes us neatly to weight as I mentioned before the precise values for weight are going to depend on how unusual your pose is but here are some guidelines generally you can reduce weight down to around 0.8 before the output starts Breaking Free from this open post skeleton a substantial amount of time once you start loose cohesion though things go downhill very quickly at 0.75 I estimate you get a matching result about three quarters of the time at 0.5 it's maybe one in three at 0.3 and Below getting something resembling this open pose skeleton is basically Pure Luck for more challenging poses you could need to set weight above 1.0 to get the full pose as we've seen with the yoga pose for most things though keeping the default value at one should work fine much like guidance I didn't see a significant impact on weight from either Samplers or steps so these recommendations are Universal across those conditions and now for the moment you've all been waiting for it's time for the charts these charts are meant to represent the accuracy of open pose with a standard human pose if you use something hard like a yoga pose the accuracy will be lower than estimated here for the first chart we have unipc at 30 steps since unipc is not impacted by Guidance the only effect here is for weight you should be fine until around 0.8 with accuracy falling off until 0.3 after which it is basically Pure Luck this next chart is for all the Samplers except you need PC at 30 steps add a guidance of 0.5 or above the chart is identical to any PC chart because guidance really isn't a factor at these levels once again accuracy starts being lost below weights of 0.8 and open pose is basically useless below 0.3 when guidance is lower than 0.5 you start to see a decrease in accuracy as well 0.4 to 0.3 can be hit or miss and at 0.2 open pose is really going to start to struggle any lower than that is going to be Pure Luck now that we've established ranges for weight and guidance how do we make use of this first I would recommend keeping guidance at weight at one initially to see if you get decent results I would run batch counts of at least four so you aren't relying on the luck of a single C to make this determination if you're okay in the pose you want bump up the weight even higher one of the advantages to open pose is that high weight and guidance won't damage image quality too much if you feel like the results look forced or natural you could decrease the weight so it isn't forced into an awkward pose I would only touch guidance at the very end to fine-tune the image that you like let's say you have a half form arm or something out of place in the pose lowering guidance could help stable diffusion incorporate that into the picture better by giving it more time to work on it last but not least let's talk about what open pose is capable of outputting basically you can get anything that has a humulating form to work with open pose whether that is babies aliens or Optimus Prime when your prompt does not include a humanoid entity open post begins to struggle the closer the subject is to a humanoid form or the more easily the subject can be molded into the pose the easier it will be for open pose to get it right for example prompting for a tree gets way better results than prompting for an airplane for subjects that only partially match the human form like say a mermaid the results will be deformed in the texture in this case fish scales will be applied on top in the worst scenario open pills will just put a person or a silhouette of a person in your picture regardless of your prompt sometimes this can be incorporated into your picture well like as a statue at A Temple or by a pyramid but more often than not it is going to be obtrusive and out of place fiddling with weight and Gardens can help improve this but realistically why would you if you don't want a humanoid form there are other control net modules that can outperform open pose so use them instead for being the simplest among the preprocessors there was probably a lot more depth to open pose than you expected to review open pose is great at capture in the general position of the body but cannot capture the precise position of hands and feet or information about the background of the image open pose is robust when capturing realistic human subjects of all ages when the subject is at least 20 of the height of the image open pose is robust at higher guidance and weight settings compared to other preprocessors which allows you to be more forceful in order to get the particular pose you want however like its detection open pose is also limited to outputting humanoid figures and they put them in regardless of what you want I hope you learned everything you want us to know about open pose if you enjoyed this video please be sure to like And subscribe so you can see my future tutorials on other control net modules till next time
Info
Channel: SiliconThaumaturgy
Views: 6,783
Rating: undefined out of 5
Keywords:
Id: 0RGYwnfxTRo
Channel Id: undefined
Length: 11min 49sec (709 seconds)
Published: Sat Mar 25 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.