Stable Diffusion 3 - Creative AI For Everyone!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Here, we always talk about these amazing results  of recent AI techniques like this. This is Sora,   but it is currently unreleased. That means we can  marvel at the results, but we cannot try them yet. However, oh my. The first results of  Stable Diffusion 3 are now available   for us to look at. What is that? Stable  Diffusion is a free and open source/open   model text to image AI that we can  all use for free. And interestingly,   I also hear that version 3 builds on  Sora’s architecture. I’d love to see that. Previously, we talked about a version  called Stable Diffusion XL Turbo,   and it was extremely fast. So fast that we don’t even measure it in  frames per second. Frames per second? No   sir! Cats per second is where its at. And this  could generate a hundred cats per second. That   is fantastic. However, the quality of the cats  was not as good as what I saw in other systems,   like DALL-E 3. So, can we finally  get a free and open system that   creates super high quality images?  Well, let’s have a look together! Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Well, first, the quality and the  amount of detail in these images   is absolutely incredible. But it  gets better in 3 different ways. One. Text. Remember the times when we told DALL-E  that we need a sign that says Deep Learning,   and we got this? Well, those days are still  not over. We are still not out of the water.   Current systems do much better on  text, but they can only do short,   rudimentary prompts and we often have to  run it 10 or more times to get something   meaningful. This is DALL-E version 3 trying  the same, and we are still not there. But here, would you look at that. We get some text  on the chalkboard, or look at this. This is not   just text slapped on top of an image, it is an  integral part of the image itself. It also knows   styles quite well, this could easily be a desktop  background for many, and graffiti styles are also   appearing. Now not all text on this image seems  perfect, and we don’t know how much cherry-picking   was necessary to get these, but we will soon be  able to try it ourselves, and then, we will know. Two, understanding prompt structure. This is  going to be really tough. The prompt is “Three   transparent glass bottles on a wooden table.  The one on the left has red liquid and the   number 1. The one in the middle has blue liquid  and the number 2. The one on the right has green   liquid and the number 3.” And…there we go! Now, wait, I also ran this 10 times in  DALL-E 3 and I was really strict with it,   and still it did extremely well. It was able to do  it 8 times out of 10. These were the good cases,   and these were the failure cases.  Even these are not so bad. It just   switched up the colors or added some extra  text. So why is this interesting? Well,   Stable Diffusion 3 can also do this, but it  is an open system that is free for all of us. And three, creativity. I love how it is  able to imagine new scenes that we’ve   likely never seen before. It can  use its knowledge about existing   things and extend that knowledge  into new situations. Loving it. If everything goes well, the paper will appear  in the next few days and I am also hoping to get   access to the models soon. You know, images of  Fellow Scholars holding on to their papers need   to be done. Subscribe and hit the bell icon if you  are interested in a deeper look when it arrives. However, we know some details. For instance,  the earlier Stable Diffusion 1.5 has about 1   billion parameters. SDXL is 3.5 billion. And  this new one is 0.8 billion to 8 billion. So   even the heavier version of this will still  likely generate images in a matter of seconds,   and the lighter version will I think easily  run on the phone in your pocket. And to have   this capability right in your pocket,  my goodness. What a time to be alive! And in the meantime, you can do a heck of  a lot more with already existing tools,   for instance, the Stability API can  now help you with a great deal more   than just text to image. You can get it  to reimagine parts of the scene as well. And, don’t forget, StableLM also exists. That’s  free too. If everything goes well, we will talk   about how you can run these free large language  models privately at home soon. And we will talk   about more amazing models, DeepMind’s Gemini  Pro 1.5, and get this, a smaller, free version   of it that is called Gemma that you can run at  home for free. That video is coming soon too.
Info
Channel: Two Minute Papers
Views: 137,421
Rating: undefined out of 5
Keywords: ai, stable diffusion, sd3, stable diffusion 3
Id: PddEGvUFZDQ
Channel Id: undefined
Length: 6min 44sec (404 seconds)
Published: Mon Feb 26 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.