Great paper today, Fellow Scholars! Stable
Diffusion XL Turbo. Why? Well, because today, we have these amazing computer games
and simulations that run quickly, and we measure this in frames per second. Then,
we have off-line simulations that run much slower, in seconds per frame. And today, we have
an AI technique that we can measure in cats per second. You can create so many cats
per second, and it can do this too. And, surprisingly it may even help us train
self-driving cars we’ll look into that. And there’s more, you can also try this new
tool right now. And we will talk about this amazing new technique too, you can also
try this for free too! How cool is that? So, what was this cat thing? This is Stable
Diffusion XL Turbo, a supposedly quicker version of Stable Diffusion, the popular open
source text to image AI. The original version can do absolutely amazing things, but it takes
a bit. About 20 to 60 seconds for an image. And this depends on this setting. The number of
sampling steps. We typically need 20 to 50 steps to create a high-quality image. The more steps,
the more computation we have to do, and thus, the longer we have to wait. And here is an
amazing new paper that promises…what? Can that really be? 1-4 sampling steps, often in
a single step. That sounds incredible. I mean, if this was true, we would be able to perform
text to image in… real time. Yes. Real time! But wait a second. This is not new.
Creating an image in 1-4 sampling steps has never been a problem. You can do
it any time you want with Stable Diffusion, but, unfortunately then you get this. A blurry
image. No detail. So, why is this interesting? Dear Fellow Scholars, this is Two Minute
Papers with Dr. Károly Zsolnai-Fehér. Well, it is interesting because this new paper
allows you to create images quickly, but also at the same time, give you high-quality images. Now
let’s have a look at the new technique. Wow, that is as fast as you can type. The results update
almost immediately. No more blurry images! Wow. So, how quick is it? Well, hold on to your papers
Fellow Scholars, because it can create an image in 9-10 milliseconds. Yes, that is a hundred
cats per second. The resolution is 512x512 and the quality is not bad at all - it typically
loses only against a slower version of itself, SDXL. But SDXL has been surpassed
by a new text to image technique, yes, we will have a look at that too in a
moment. So quality, checkmark, but following your prompts closely is also super important.
And in that area, checkmark too. Excellent. And there is so much more here. It can
also perform not only text to image, but image to image translation. One image goes
in, and it comes out transformed. We have seen this in Stable Diffusion before, and this helps
you unleash your creativity like never before. Remember this earlier NVIDIA paper
where you could draw a landscape, and it would almost immediately give
you a nearly photorealistic image? Now it can do that too, and
not only with landscape images, but with Apple’s memojis too. I love the
quick iteration speed here. And in fact, people are already using it out
there in the wild. Let’s see how. Here you see an incredible example of real-time
urban planning, prototyping and visualization. And you can even create animations with
it. All this for free and open source. So, how is this even possible? It is possible
through a technique called Adversarial Diffusion Distillation. Luckily, we have the paper with a
detailed description of the phenomenon. So here’s how you do it: first, train a complex diffusion
model. This starts out from a noisy image, then over time, learns to reorganize this noise
into an image that depicts our text prompt. But it does this slowly. Let’s call it the
teacher model. Now comes the magic! We now create a smaller student model that tries to
mimic its teacher. It learns how the teacher behaves and tries to reproduce its behavior.
But wait - we already have the teacher model, why copy it? Well, we are copying it with this
student neural network, so we retain the quality, but at the same time, this student
network will be much cheaper and faster. So more corgis and cats cheaper and faster. Now, hold on to your papers Fellow Scholars,
because perhaps this could also be used to train self-driving cars. How? Well, look at this
cool new paper from NVIDIA, where they use real driving logs to analyze previous situations and
even create new ones. Now here, all of these agents are controlled by NVIDIA’s AI, and…are
you thinking what I am a thinking? Oh yeah, just imagine putting this into an image to image
translator AI two more papers down the line, and bam, you have a simulation where you can safely
train your cars in challenging situations that actually happened or may happen. Imagine this in
a similar manner to this earlier work where we can go from video game graphics to a real life, and
back. But this time with real driving situations. But this new paper can tokenize trajectories, meaning that it breaks down complex
driving situations much like you would break down a sentence into words,
and then letters. And it does it very, very well. How well? Look. It is able to create
more lifelike scenarios outperforming many-many previous techniques. Like a mini video game with
intelligent AI players. What a time to be alive! Now, as promised, we are going to have
a look at this new text to image AI that looked at 1.1 billion images, and learned to
create incredibly high quality outputs for your text prompts. You can try it here, the
link is available in the video description. So, how good is it? Well, let’s test is against
Stable Diffusion XL. Look at that! Approximately 6 to 7 times out of 10 it is preferred over SDXL.
I think that is insane. Don’t forget, SDXL is a paper that came out approximately 5 months
ago, and it has already been surpassed. Bravo! And while we are looking at some of
these eye-poppingly beautiful images, just imagine that two more papers down the line,
and I am sure that we are going to be looking at images and videos of these created in real time,
and all you need to provide is just a text prompt.