Stable Diffusion AI: 100 Cats Per Second…For Free!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

Great paper today, Fellow Scholars! Stable Diffusion XL Turbo. Why? Well, because today, we have these amazing computer games and simulations that run quickly, and we measure this in frames per second. Then, we have off-line simulations that run much slower, in seconds per frame. And today, we have an AI technique that we can measure in cats per second. You can create so many cats per second, and it can do this too. And, surprisingly it may even help us train self-driving cars we’ll look into that. And there’s more, you can also try this new tool right now. And we will talk about this amazing new technique too, you can also try this for free too! How cool is that? So, what was this cat thing? This is Stable Diffusion XL Turbo, a supposedly quicker version of Stable Diffusion, the popular open source text to image AI. The original version can do absolutely amazing things, but it takes a bit. About 20 to 60 seconds for an image. And this depends on this setting. The number of sampling steps. We typically need 20 to 50 steps to create a high-quality image. The more steps, the more computation we have to do, and thus, the longer we have to wait. And here is an amazing new paper that promises…what? Can that really be? 1-4 sampling steps, often in a single step. That sounds incredible. I mean, if this was true, we would be able to perform text to image in… real time. Yes. Real time! But wait a second. This is not new. Creating an image in 1-4 sampling steps has never been a problem. You can do it any time you want with Stable Diffusion, but, unfortunately then you get this. A blurry image. No detail. So, why is this interesting? Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Well, it is interesting because this new paper allows you to create images quickly, but also at the same time, give you high-quality images. Now let’s have a look at the new technique. Wow, that is as fast as you can type. The results update almost immediately. No more blurry images! Wow. So, how quick is it? Well, hold on to your papers Fellow Scholars, because it can create an image in 9-10 milliseconds. Yes, that is a hundred cats per second. The resolution is 512x512 and the quality is not bad at all - it typically loses only against a slower version of itself, SDXL. But SDXL has been surpassed by a new text to image technique, yes, we will have a look at that too in a moment. So quality, checkmark, but following your prompts closely is also super important. And in that area, checkmark too. Excellent. And there is so much more here. It can also perform not only text to image, but image to image translation. One image goes in, and it comes out transformed. We have seen this in Stable Diffusion before, and this helps you unleash your creativity like never before. Remember this earlier NVIDIA paper where you could draw a landscape, and it would almost immediately give you a nearly photorealistic image? Now it can do that too, and not only with landscape images, but with Apple’s memojis too. I love the quick iteration speed here. And in fact, people are already using it out there in the wild. Let’s see how. Here you see an incredible example of real-time urban planning, prototyping and visualization. And you can even create animations with it. All this for free and open source. So, how is this even possible? It is possible through a technique called Adversarial Diffusion Distillation. Luckily, we have the paper with a detailed description of the phenomenon. So here’s how you do it: first, train a complex diffusion model. This starts out from a noisy image, then over time, learns to reorganize this noise into an image that depicts our text prompt. But it does this slowly. Let’s call it the teacher model. Now comes the magic! We now create a smaller student model that tries to mimic its teacher. It learns how the teacher behaves and tries to reproduce its behavior. But wait - we already have the teacher model, why copy it? Well, we are copying it with this student neural network, so we retain the quality, but at the same time, this student network will be much cheaper and faster. So more corgis and cats cheaper and faster. Now, hold on to your papers Fellow Scholars, because perhaps this could also be used to train self-driving cars. How? Well, look at this cool new paper from NVIDIA, where they use real driving logs to analyze previous situations and even create new ones. Now here, all of these agents are controlled by NVIDIA’s AI, and…are you thinking what I am a thinking? Oh yeah, just imagine putting this into an image to image translator AI two more papers down the line, and bam, you have a simulation where you can safely train your cars in challenging situations that actually happened or may happen. Imagine this in a similar manner to this earlier work where we can go from video game graphics to a real life, and back. But this time with real driving situations. But this new paper can tokenize trajectories, meaning that it breaks down complex driving situations much like you would break down a sentence into words, and then letters. And it does it very, very well. How well? Look. It is able to create more lifelike scenarios outperforming many-many previous techniques. Like a mini video game with intelligent AI players. What a time to be alive! Now, as promised, we are going to have a look at this new text to image AI that looked at 1.1 billion images, and learned to create incredibly high quality outputs for your text prompts. You can try it here, the link is available in the video description. So, how good is it? Well, let’s test is against Stable Diffusion XL. Look at that! Approximately 6 to 7 times out of 10 it is preferred over SDXL. I think that is insane. Don’t forget, SDXL is a paper that came out approximately 5 months ago, and it has already been surpassed. Bravo! And while we are looking at some of these eye-poppingly beautiful images, just imagine that two more papers down the line, and I am sure that we are going to be looking at images and videos of these created in real time, and all you need to provide is just a text prompt.

Info

Channel: Two Minute Papers

Views: 96,012

Rating: undefined out of 5

Keywords: ai, sdxl turbo, stable diffusion turbo, stable diffusion xl turbo, stable diffusion xl, nvidia

Id: Iol2rb65aSk

Channel Id: undefined

Length: 8min 20sec (500 seconds)

Published: Sat Dec 23 2023