Dear Fellow Scholars, this is Two Minute
Papers with Dr. Károly Zsolnai-Fehér. Today we are going to look at images, and
even videos that were created by an AI, and honestly, I cannot believe
how good some of these are. And, it is possible that you can run this AI
at home yourself? You’ll find out today. This year, through OpenAI’s DALL-E 2, we are
entering the age of AI-driven image generation. Most of these techniques take a text
prompt, which means that we can write whatever we wish to see on the
screen, and first, a noise pattern appears that slowly morphs into
exactly what we are looking for. This is what we mean when we say we are
talking about diffusion-based models. Now, OpenAI’s DALL-E 2 can create
incredibly creative images, and Google’s Parti and Imagen AIs
are also at the very least as good. Sometimes they even win linguistic
battles against OpenAI’s solution. But there is a problem. All of them are missing
something. And that is the model weights, and the source code. This means
that these are all closed solutions, and we cannot pop the hood and look around inside. But now, here is a new solution called
Stable Diffusion, where the model weights and the full source code are available. I
cannot overstate how amazing this is. So, to demonstrate why this is
so amazing, here are two reasons. Reason number one is that with this, we
can finally take out our digital wrench and tinker with it. For instance, we
can now adjust the internal parameters in ways that we cannot do with the
closed solutions like DALL-E 2 and Imagen. So now, let’s have a look together at 10
absolutely amazing examples of what it can do! After that, I’ll tell you about reason
number two of how it gets even better. One, dreaming. Since the internal parameters
are exposed, we can add small changes to them, create a bunch of outputs that are similar,
and then finally, stitch these images together as a video. This is so much better for exploring
ideas, just imagine that sometimes you get an image that is almost what you are looking for,
but the framing, or the shape of the doggy is not exactly perfect, well, you won’t need to throw
out these almost perfect solutions anymore. Look at that. With this, we can make the perfect good
boy so much easier. I absolutely love it. Wow. Two, interpolation. Now hold on to your papers,
because we can even create a beautiful visual novel like this one by entering a bunch
of prompts, like the ones you see here, and we don’t go from just one image to the
next one in one jarring jump, but instead, these images can now be morphed into the next one,
creating these amazing transitions. By the way, the links to all of these materials
are available in the video description, including a link to the full version
of the video that you see here. Three, its fantasy imagery are truly something
else. Whether you are looking for landscapes, I was quite surprised by how competent Stable
Diffusion is at creating those, these tree houses are amazing too, but that’s not when I fell
off the chair. I fell off the chair when I saw these realistic fairy princesses. I
did not expect it to be able to create such amazingly realistic humans. How cool is that! Four, we can also create a collage.
Here, we can take a canvas, enter several prompts, and select a range for
each of them. Now the issue is that there is space between the images, and there is another
problem, even if there is no space between them, they won’t blend into each other.
No matter, Stable Diffusion can also perform image inpainting, which means
that we select a region, delete it, and it will be filled in with information
based on its surroundings. And the results are spectacular. We don’t get many separate
images, we get one coherent image instead. Five, you know what, let’s look at a few more
fantasy examples. Here are some of my favorites. Six, now these are diffusion-based models, which
means that they start out from a bunch of noise, and slowly adjust the pixels of this
image to resemble our input text prompts a little more. Hence, they are very sensitive to
the initial noise patterns that we start out from. Andrej Karpathy found an amazing way to take
advantage of this property by adjusting this noise, but just a tiny bit, and create many
new, similar images. When stitched together, it results in a hypnotic video like this one.
Random noise walks if you will. Loving it! Seven, it can generate not only images, but
with a little additional work, even animations. You are going to love this one. Look. This was
made by creating the same image with the eyes open and closed, and with a little additional
work of blending them together, it looks like this. Once again, the
links to all of these works are available in the video description if you wish
to have a closer look at the process. Eight, you remember that it
can create fantastic portraits, and it can interpolate between them. Now,
putting it all together, it can create portraits, and interpolate between them, creating these
sometimes smooth, sometimes a little jumpy videos. And don’t forget, nine, variant generation is still possible. We can
still give it an input image, and since it understands what this image depicts,
it can also repaint it in different variations. And finally, ten. The fact that these amazing
images come out of Stable Diffusion does not mean that we have to use them in their entirety. If
there is just one part of an image that we like, be it the knight on a horse, or the castle,
that is more than enough. We can discard the rest of the image and just use the parts
that we love best, and make an awesome montage out of it. I would say that here, very few
humans would be able to tell the trick. Now, we discussed that we can now
pop the hood and tinker with this AI, that was one of the amazing reasons behind
these results, but I promised two reasons why this is so good. So what is reason
number two? Is it possible that? Yes! This is the moment you have been waiting for.
You can now try it yourself. If you are patient, you can engage in changing the internal
parameters here and get some amazing variants. You might have to wait for a bit, but as of the
making of this video, it works. Now what happens when you Fellow Scholars get over there, who
really knows, we have crashed plenty of websites before with our Scholarly Stampede. And, if you
don’t want to wait or run some more advanced experiments, you can run the model yourself at
home on a consumer graphics card. Loving it. And if you are unable to try it, don’t despair, AI-based image generation is only getting cheaper
and more democratized from here on out. So, a little open-source competition for
OpenAI and Google. What a time to be alive! And, please, as always, whatever you do, do
not forget to apply the First Law of Papers, which says that research is a
process. Do not look at where we are, look at where we will be two more papers
down the line. For instance, here are some results from DALL-E 1, and just a year later,
DALL-E 2 was capable of this. Just a year later. That is unbelievable. Just imagine
what we will be able to do 5 years from now. If you have some ideas, make sure
to leave a comment about that below. So, finally Stable Diffusion, a free and open
source solution for AI-based image generation. Double thumbs up. This is something
for everyone out there and it really shows the power of collaboration as us,
tinkerers around the world work together to make something amazing. I
love it. Thank you so much! And note that all this took about 600 thousand
dollars to train. Now, make no mistake, that is a lot of dollars, but, this also means that creating
an AI like this does not cost tens of millions of dollars anymore, and the team at Stability AI is
already working on a smaller and cheaper model than this. So, we are now entering not only
the age of AI-based image generation, but the age of free and open AI-based image generation.
Oh yes! And for now, let the experiments begin! Thanks for watching and for your generous
support, and I'll see you next time!