How ControlNet v1.1 Is Revolutionizing AI Art Even Further

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] the release of control net back in February 10th has taken the air Community by storm why because it has changed how precisely we can generate images even though it's only two months old in the AI timeline it already feels ancient but that's okay because control net 1.1 has just been released so in today's video I will quickly review the new development since my first control net video that you may have missed and show you what's better what's different and what's new about control net 1.1 for those of you who don't know what control net is it essentially allows you to provide a reference image to help generate your images more accurately with text for stable diffusion models the reference image can be of different types in the control name 1.0 release there were eight total models that were officially published each of them lets you control your generation in its own unique way for example candy Edge msld lines HDD boundary scribbles human pose semantic segmentation depth normal map and we were also teased with the unreleased linear colorization model initially these official controller models were only trained on stable to Fusion 1.5 however some new implementations were made shortly after to generalize control net 1.0 models to all other different base models including fine-tune models based on SD 1.5 like anything V3 or SD 2.1 there's also ft 16 models to run faster on GPU and additions like the concept of moti controller which lets you stack multiple references as input like depth plus human pose in a single generation most of these were optimized and included in the SD web UI controller which is an extension for control net for automatic 1111 and also another paper called T2i adapter which is very similar to control net published only a short while after it some control net models also have preprocessors which are tools that help obtain a specific type of reference image from an image which often requires special models to analyze the image to get the desired type of reference this way you can use the preprocessor to generate a reference if you don't want to make the reference manually some models don't have preprocessors available like transfer image to scribble but other preprocessors are available such as extractor depth map from an input image however using preprocessors is not always necessary so it really depends on your use case some people will use external tools to pose a stick figure so it's pretty flexible to achieve the controls that you want to do what's even better is that control net also provides ways to train your own control net allowing for Community main models like phase Landmark uncanny phase media pipe phase and Zoe depth which are great alternatives to choose from besides the official control net models one of the coolest control net studies I've seen is from this Twitter user called toy XYZ they conducted different studies of various angles directions pose and amounts of figures to see how well control net performs with these inputs this includes face landmarks and Kenny Edge mix outline then pose depth plus pose pose amount and distance phase amount and directions semantic map and pose and so much more they also propose a very cool workflow that involves mapping poses in his special early faces in 3D with blender and using the 2D view to screenshot and generate the results you can explore their blog here just be warned their Twitter timeline is sometimes slightly not safe for work what's even cooler is that controlling has been developed to generate coherent videos too temporo net or the grid method which I mentioned in a recent text to video video is a way to generate or style transfer videos with much less flickering it relies on the interesting fact that multiple images will be highly coherent if they are in the same image generation this texture video style transfer works by taking a few frames out of video combining those frames into a single image like a grid styling it with image to image put them back together then interpolating between them with absinthe or DaVinci resolves the flickering tool this naturally reduces flickering between the frames compared to generating frame by frame but unfortunately the video length is then limited by how large the grid image your Hardware can process so it usually can go beyond a few seconds however this may all change soon as control net 1.1 has proposed some really interesting experimental models along with other improved ones that could change image based stylization and editing however no major adjustments like changing neural network architecture were made which is actually good news since it means 1.0 architecture works well so far for the improvements most of the official 1.1 models were trained with an Nvidia a100 for 200 GPU hours generally resulting in increased robustness and quality the improved models include depth normal map caddy Edge mlsd scribbles soft Edge segmentation human pose and paint line art and a specific anime line art model that was not released until controller 1.1 with these improved control net models workflows that include latent coupling such as line art colorization can provide more precise edits in a Content Rich image to explain what latent coupling is in a single sentence it is basically a semantic AI Paint Bucket for Regions you specified like having the hair be orange in 1V region but black in another for the experimental models there is now an instructor pix model which looks extremely promising compared to the original and truck picks to pix paper and can perform edits that look much more realistic then there is this very fascinating Shuffle model that is trained to recompose images I was initially quite confused about what it actually does but it seems like an image stylization method that does not require any clip related functions which is very interesting it feels like it's what something tokenization can do like Laura or dreamboost but it doesn't rely on tokenization so the input image acts as a base and the reference image serves as the style and you can use the prompt to guide what the resulting image looks like based on these two images you know now I understand why these two models are code experimental because there are so many interesting experiments that can be done to see their limits and there is even the possibility of eliminating the need of dream Booth or Laura in cases where you don't want to train a model on the side note the author of control net Lumen has also repeatedly stated that this is the only only stylization method that will be developed and maintained officially and no other clip or tokenization method will be implemented you mentioned it's because this is the most promising method out of everything else and they have given up on developing those methods too so yo should leave him alone and let him cook the unfinished model is another jaw-dropping addition to this legendary work though it is a tile control net model and it creates large images like 4K or even higher by titling smartly before if you wanted to create an image at a very high resolution some people tiled the image into different parts and then upscale them this created the problem of tile borders becoming very obvious after upscaling and a lack of coherence between the tiles since they were generated separately on top of that the prompt will also be a problem when upscaling in the fusion to put it in author lumen's words if your prom is a beautiful girl and you split an image into 16 blocks and do diffusion in each block you will get 16 beautiful girls rather than a beautiful girl and if you use meaningless prompts like clear super clear Ultra clear for some blocks there is a possibility that content will be generated randomly and not have an overall consistency that dictates the entire image control net tile solves this problem by identifying and increasing the influence of the semantic Target and also decreasing the prompt influence on subjects inside the images that are not related so you can see the a handsome men prompt doesn't influence the upscale of these image Styles it is a very big brain idea and it is still in development and this is just the first official follow-up of control net 1.0 already with this many new things added to control net we can make a religion out of this as of the making of this video automatic 111 SD web UI still hasn't implemented 1.1 and they are still working on it but you can check this repo for any updates and this video is brought to you by skillshare skillshare is an online learning community with thousands of inspiring classes for creators you can freely explore new skills deepen existing passions and have fun with your creativity whether you're interested in learning AI from scratch picking up digital art starting to take photography or even doing video editing like I do for these YouTube videos skillshare has a class for you personally I've been using skillshare to level up my Photoshop skills I was able to learn from highly experienced Professionals in the field and apply these new skills directly into the work on this channel one course I've been watching is Adobe Photoshop CC events training course it's been providing me with a lot of great Photoshop tips and better my understanding of how Photoshop works it goes in an incredible depth which helps me to use Photoshop more efficiently and teaches me a lot of things that I don't understand about masking what's great about skillshare is that you can learn at your own pace so when you're curious to learn new stuff you can start by entering the topic and explore to see if you can finish the total lecture time this way you can easily plan your weekends to learn about the passion that you always wanted to start what's even better is that they are currently also providing an offer of one month free premium trial so you have plenty of time to check out their other amazing ad free and high quality classes the first 1000 people to use the link will get one month free trial of skillshare so you can start exploring your creatives today and thank you guys for watching a big shout out to Andrew less chillius Chris LeDoux Alex Marie's and many others that support me through patreon or YouTube follow my Twitter if you haven't and I'll see you all in the next one
Info
Channel: bycloud
Views: 99,804
Rating: undefined out of 5
Keywords: bycloud, bycloudai, controlnet 1.1, stable diffusion, stable diffusion tutorial, controlnet 1.1 nightly, multicontrolnet, sd controlnet, toyxyz, controlnet 1.0, controlnet github, controlnet automatic1111, controlnet stable diffusion, controlnet tutorial, controlnet showcase, controlnet demo, controlnet introduction, controlnet guide, controlnet animation, controlnet intro, sd, automatic1111, control net tutorial, control net
Id: 15Q6OR0MWVk
Channel Id: undefined
Length: 9min 14sec (554 seconds)
Published: Mon May 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.