ControlNet Revolutionized How We Use AI To Generate Images

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
The idea of we have good control over text to  image models probably came across our mind   one or two times because of  how well we can generate now.   And ever since stability AI released Sable  Diffusion 2.1, we were like, yay, depth through   image is going to give us one more way to  control image generations other than image   to image and text to image. Yes, that was pretty amazing,   but have you ever thought  about accurate human post to   image, precise normal map to image, coherent  semantic map to image, or even line R to image.   Maybe something that can generalize the idea of  whatever to image, that would be game changing.   Let me introduce you to control net, which is  a neural net structure that controls large   diffusion models in a way that supports additional  input conditions much better than any current   existing methods. This may sound like you're every other scribble   to image or semantic to image model, but actually this is something much more generalizable and it   is definitely going to improve people's workflow by a lot.   From the same author of Style to Paint, which is  a 5 years old project that Lvmin Zhang developed   to help artists to colorize line art with  AI, he explains that control net copies the   weights of the neural network blocks into  a locked copy in the trainable copy.   And while the trainable one learns your  condition, the locked one preserves your model.   With this, training with a small data set of  image pairs will not destroy the production   ready diffusion models and can perform basically  any input conditions which you train to generate   images with the quality of the original model. With more control, higher quality images can then   be generated by the same models. To make this more understandable,   depth of diffusion's new depth through image only takes in a 64x64 depth map, while the 2.1 stable   diffusion model itself is capable of generating a raw 512 or even a 768 image.   But with control net, now you  can input a 512x512 depth map.   So that the diffusion model will be able to  follow the depth map more accurately since   it's in higher quality, so a better  generated image can then be made.   Control net was built with the idea that text  cannot fully handle all the problem conditioning   in image generation, because text and image  both are ideas that are based in a completely   different dimension, and with text being the  hardcore carry as the interface for us to   communicate with diffusion models, I think  you can relate that sometimes your ideas are   hard to be expressed  efficiently in text too right?   And if only the AI can understand your posing  image a bit better, it could just save you   so much time. Let me just show   you the results and you would understand. Just keep in mind that the official demos   from control net are all in stable diffusion 1.5, so the quality with what stable diffusion   2.1 can generate may differ significantly. But it's not because of control net.   Just look at the depth clarity  compared to SD2.0's official result.   Even though the control net is controlling  SD1.5, the generated images are just a lot   more clear, especially the background  or the jaw of the old man.   And I would say I would not be able to  tell the difference if they are unlabeled.   What's even better is that control net reduces  the training time from 2000 GPU hours with   more than 12 million data down to 13090ti  in less than a week with only 200k training   data. This can save so much money.   Human pose to image looks so clean too, everything  synthesized around it perfectly and the anatomy   makes sense while the R  generated is coherent too.   Even though Michael Jackson is in  the middle of the air in this one.   The author specified that these are not cherry  picked results and if you want to verify,   you can run the code yourself. It's open source by a college student.   Even poses where all the limbs are folded or  not included, the resulting images do not   go hem at all and they obey the human pose  input faithfully even in different contexts.   The arms will be posing correctly  and it just feels so satisfying.   There are even more tests that the author  has made just to show you how generalizable   this control net is and they are  all actually pretty amazing.   Like using HED boundary as an input reference. HED boundary is one of the edge detecting   methods and will preserve the edges that are highly contrasted in input images, making this   pretty suitable for recoloring and stylizing. And there's using MLSD lines which is also   another edge detecting method that does line segmenting and can be used as references to   generate a scenery realistically with layouts that make sense and details that are coherent.   Our usecanny edge where it will extract very  detailed complex edges for you so the generated   AR will have those detailed attributes that  normal textual image or image to image will   not be able to achieve or preserve. You are probably fed up seeing the amount of   scribble to image and semantic segmentation demos on the internet.   But that works pretty well too so I'll  just put it here as a quick mention.   But normal map to image is  going to be interesting.   Imagine using a normal map that you generated  from Econ which is the latest and I think   the best image to mesh AI that  it didn't have time to cover   and be able to use that as a reference input.   This could be a very useful  tool similar to depth to image.   Normal map to image will be able to focus  on the subject's coherency instead of the   surroundings and the depth so it can make  edits to the subject more directly and maybe   even have more control to  edit the background too.   But to be honest the highlight of this is  definitely the line art colorizing method   that the author originally  proposed for style to paints 5.   The reason why we have not seen any method  like this is because the current image to   image method would struggle to preserve the  line art details and would not work as a viable   colorizing tool for black and white artwork where  you have to faithfully follow the outlines.   Control net is probably what the style2paints 5 is based on which would do exactly   like accurately preserving the details like  how other edge detection input to image work.   However he did not release the colorization  tool yet due to technical issues and ethical   concerns but it will probably be released  when he finishes improving the tools and has   ways to tackle the ethical aspects. Then maybe I'll make a video about it again.   This research is definitely going to change  how the Big 5 train and control their large   diffusion models and with its GitHub page  getting 300 stars just under 24 hours without   any promotion, it is safe to say that Lvmin's  work is coin to worth millions of dollars to   these companies. I'll link his paper down in the description   and to quote one of my discord member, I read this paper and it was insane and Lvmin is too   good for Stanford. Which is pretty   funny and join my discord if you haven't. This opens up the realistic possibilities for   artistic usage, architectural rendering, design brainstorming,   storyboarding and so much more. Even black and white image colorization   may be possible with diffusion now with extreme accuracy because now you can specify the day and   age of the image so that it can color it very precisely.   Not to mention image restoration, that is  probably going to be possible with diffusion   now too. Thanks to control net.   He also made a page for training our own  model and use case with control net so check   it out if you're interested. Or check out today's sponsor OpenCV if you   are also interested in generating AI art. Yes, you heard it right.   The computer vision org OpenCV decided to  sponsor this video to promote their first   ever in depth AI art course that will cover  the basic and the advanced topics related   to generating AI art. Not gonna lie, it took me by surprise too but   OpenCV has a really good track record of coding courses that ranges from a few hours to   a few months that teaches you how to master computer vision, high torch, tensorflow, and even   an advanced course in real world CV applications. If you haven't seen them, even the free ones   are pretty well taught, especially how they cover pretty much everything OpenCV has to offer.   So if they have an AI art course, I  think it'll be pretty high quality too.   Right now, they are launching a Kickstarter  on February 14th which is to fund their AI   art course so that they can spend time  developing the best AI course they can.   Previously, they were able to raise a total of  $3 million for various courses and projects   and this AI art course is the next that  they are planning to venture into.   The pricing of course will be relatively lower  than their OpenCV courses as it would be a   course that can be completed in a few weekends. To also celebrate their Kickstarter launch,   they are hosting an AI art generation contest with the prize of one iPad Air.   So definitely join the contest if you are  interested in getting a free iPad and check   out their Kickstarter page for more  information about their AI course.   Thank you so much for watching as usual, a  big shout out to Andrew Leschevias, Chris   Ladoo, and many others that support  me through Patreon or YouTube.   Follow my Twitter if you haven't and  I'll see you all in the next one.
Info
Channel: bycloud
Views: 98,980
Rating: undefined out of 5
Keywords: bycloud, bycloudai, controlnet, style2paints v5, line art to image, ai art colorization, anything to image, depth to image, edge to image, pose to image, ai pose to image, pose to image ai, normal map to image, line art colorization ai, line art colorization, control net, controlnet + diffusion, controlnet ai, controlnet diffusion, text to image depth model, diffusion model, ai colorization, text to image colorization, style2paints, lvmin zhang, controlnet stable diffusion
Id: rCygkyMuSQo
Channel Id: undefined
Length: 8min 8sec (488 seconds)
Published: Tue Feb 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.