Stable Diffusion 3 - Creative AI For Everyone!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Here, we always talk about these amazing results of recent AI techniques like this. This is Sora, but it is currently unreleased. That means we can marvel at the results, but we cannot try them yet. However, oh my. The first results of Stable Diffusion 3 are now available for us to look at. What is that? Stable Diffusion is a free and open source/open model text to image AI that we can all use for free. And interestingly, I also hear that version 3 builds on Sora’s architecture. I’d love to see that. Previously, we talked about a version called Stable Diffusion XL Turbo, and it was extremely fast. So fast that we don’t even measure it in frames per second. Frames per second? No sir! Cats per second is where its at. And this could generate a hundred cats per second. That is fantastic. However, the quality of the cats was not as good as what I saw in other systems, like DALL-E 3. So, can we finally get a free and open system that creates super high quality images? Well, let’s have a look together! Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Well, first, the quality and the amount of detail in these images is absolutely incredible. But it gets better in 3 different ways. One. Text. Remember the times when we told DALL-E that we need a sign that says Deep Learning, and we got this? Well, those days are still not over. We are still not out of the water. Current systems do much better on text, but they can only do short, rudimentary prompts and we often have to run it 10 or more times to get something meaningful. This is DALL-E version 3 trying the same, and we are still not there. But here, would you look at that. We get some text on the chalkboard, or look at this. This is not just text slapped on top of an image, it is an integral part of the image itself. It also knows styles quite well, this could easily be a desktop background for many, and graffiti styles are also appearing. Now not all text on this image seems perfect, and we don’t know how much cherry-picking was necessary to get these, but we will soon be able to try it ourselves, and then, we will know. Two, understanding prompt structure. This is going to be really tough. The prompt is “Three transparent glass bottles on a wooden table. The one on the left has red liquid and the number 1. The one in the middle has blue liquid and the number 2. The one on the right has green liquid and the number 3.” And…there we go! Now, wait, I also ran this 10 times in DALL-E 3 and I was really strict with it, and still it did extremely well. It was able to do it 8 times out of 10. These were the good cases, and these were the failure cases. Even these are not so bad. It just switched up the colors or added some extra text. So why is this interesting? Well, Stable Diffusion 3 can also do this, but it is an open system that is free for all of us. And three, creativity. I love how it is able to imagine new scenes that we’ve likely never seen before. It can use its knowledge about existing things and extend that knowledge into new situations. Loving it. If everything goes well, the paper will appear in the next few days and I am also hoping to get access to the models soon. You know, images of Fellow Scholars holding on to their papers need to be done. Subscribe and hit the bell icon if you are interested in a deeper look when it arrives. However, we know some details. For instance, the earlier Stable Diffusion 1.5 has about 1 billion parameters. SDXL is 3.5 billion. And this new one is 0.8 billion to 8 billion. So even the heavier version of this will still likely generate images in a matter of seconds, and the lighter version will I think easily run on the phone in your pocket. And to have this capability right in your pocket, my goodness. What a time to be alive! And in the meantime, you can do a heck of a lot more with already existing tools, for instance, the Stability API can now help you with a great deal more than just text to image. You can get it to reimagine parts of the scene as well. And, don’t forget, StableLM also exists. That’s free too. If everything goes well, we will talk about how you can run these free large language models privately at home soon. And we will talk about more amazing models, DeepMind’s Gemini Pro 1.5, and get this, a smaller, free version of it that is called Gemma that you can run at home for free. That video is coming soon too.

Info

Channel: Two Minute Papers

Views: 137,421

Rating: undefined out of 5

Keywords: ai, stable diffusion, sd3, stable diffusion 3

Id: PddEGvUFZDQ

Channel Id: undefined

Length: 6min 44sec (404 seconds)

Published: Mon Feb 26 2024