Stable Diffusion AI: 100 Cats Per Second…For Free!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Great paper today, Fellow Scholars! Stable  Diffusion XL Turbo. Why? Well, because today,   we have these amazing computer games  and simulations that run quickly,   and we measure this in frames per second. Then,  we have off-line simulations that run much slower,   in seconds per frame. And today, we have  an AI technique that we can measure in cats   per second. You can create so many cats  per second, and it can do this too. And,   surprisingly it may even help us train  self-driving cars we’ll look into that.   And there’s more, you can also try this new  tool right now. And we will talk about this   amazing new technique too, you can also  try this for free too! How cool is that? So, what was this cat thing? This is Stable  Diffusion XL Turbo, a supposedly quicker   version of Stable Diffusion, the popular open  source text to image AI. The original version   can do absolutely amazing things, but it takes  a bit. About 20 to 60 seconds for an image. And this depends on this setting. The number of  sampling steps. We typically need 20 to 50 steps   to create a high-quality image. The more steps,  the more computation we have to do, and thus,   the longer we have to wait. And here is an  amazing new paper that promises…what? Can   that really be? 1-4 sampling steps, often in  a single step. That sounds incredible. I mean,   if this was true, we would be able to perform  text to image in… real time. Yes. Real time! But wait a second. This is not new.  Creating an image in 1-4 sampling   steps has never been a problem. You can do  it any time you want with Stable Diffusion,   but, unfortunately then you get this. A blurry  image. No detail. So, why is this interesting? Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Well, it is interesting because this new paper  allows you to create images quickly, but also at   the same time, give you high-quality images. Now  let’s have a look at the new technique. Wow, that   is as fast as you can type. The results update  almost immediately. No more blurry images! Wow. So, how quick is it? Well, hold on to your papers  Fellow Scholars, because it can create an image   in 9-10 milliseconds. Yes, that is a hundred  cats per second. The resolution is 512x512 and   the quality is not bad at all - it typically  loses only against a slower version of itself,   SDXL. But SDXL has been surpassed  by a new text to image technique,   yes, we will have a look at that too in a  moment. So quality, checkmark, but following   your prompts closely is also super important.  And in that area, checkmark too. Excellent. And there is so much more here. It can  also perform not only text to image,   but image to image translation. One image goes  in, and it comes out transformed. We have seen   this in Stable Diffusion before, and this helps  you unleash your creativity like never before. Remember this earlier NVIDIA paper  where you could draw a landscape,   and it would almost immediately give  you a nearly photorealistic image? Now it can do that too, and  not only with landscape images,   but with Apple’s memojis too. I love the  quick iteration speed here. And in fact,   people are already using it out  there in the wild. Let’s see how. Here you see an incredible example of real-time  urban planning, prototyping and visualization.   And you can even create animations with  it. All this for free and open source. So, how is this even possible? It is possible  through a technique called Adversarial Diffusion   Distillation. Luckily, we have the paper with a  detailed description of the phenomenon. So here’s   how you do it: first, train a complex diffusion  model. This starts out from a noisy image,   then over time, learns to reorganize this noise  into an image that depicts our text prompt. But it does this slowly. Let’s call it the  teacher model. Now comes the magic! We now   create a smaller student model that tries to  mimic its teacher. It learns how the teacher   behaves and tries to reproduce its behavior.  But wait - we already have the teacher model,   why copy it? Well, we are copying it with this  student neural network, so we retain the quality,   but at the same time, this student  network will be much cheaper and faster. So more corgis and cats cheaper and faster. Now, hold on to your papers Fellow Scholars,  because perhaps this could also be used to   train self-driving cars. How? Well, look at this  cool new paper from NVIDIA, where they use real   driving logs to analyze previous situations and  even create new ones. Now here, all of these   agents are controlled by NVIDIA’s AI, and…are  you thinking what I am a thinking? Oh yeah,   just imagine putting this into an image to image  translator AI two more papers down the line, and   bam, you have a simulation where you can safely  train your cars in challenging situations that   actually happened or may happen. Imagine this in  a similar manner to this earlier work where we can   go from video game graphics to a real life, and  back. But this time with real driving situations. But this new paper can tokenize trajectories,   meaning that it breaks down complex  driving situations much like you   would break down a sentence into words,  and then letters. And it does it very,   very well. How well? Look. It is able to create  more lifelike scenarios outperforming many-many   previous techniques. Like a mini video game with  intelligent AI players. What a time to be alive! Now, as promised, we are going to have  a look at this new text to image AI that   looked at 1.1 billion images, and learned to  create incredibly high quality outputs for   your text prompts. You can try it here, the  link is available in the video description. So, how good is it? Well, let’s test is against  Stable Diffusion XL. Look at that! Approximately   6 to 7 times out of 10 it is preferred over SDXL.  I think that is insane. Don’t forget, SDXL is   a paper that came out approximately 5 months  ago, and it has already been surpassed. Bravo! And while we are looking at some of  these eye-poppingly beautiful images,   just imagine that two more papers down the line,  and I am sure that we are going to be looking at   images and videos of these created in real time,  and all you need to provide is just a text prompt.
Info
Channel: Two Minute Papers
Views: 96,012
Rating: undefined out of 5
Keywords: ai, sdxl turbo, stable diffusion turbo, stable diffusion xl turbo, stable diffusion xl, nvidia
Id: Iol2rb65aSk
Channel Id: undefined
Length: 8min 20sec (500 seconds)
Published: Sat Dec 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.