Stable Diffusion: DALL-E 2 For Free, For Everyone!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Dear Fellow Scholars, this is Two Minute  Papers with Dr. Károly Zsolnai-Fehér. Today we are going to look at images, and  even videos that were created by an AI,   and honestly, I cannot believe  how good some of these are. And,   it is possible that you can run this AI  at home yourself? You’ll find out today. This year, through OpenAI’s DALL-E 2, we are  entering the age of AI-driven image generation.   Most of these techniques take a text  prompt, which means that we can write   whatever we wish to see on the  screen, and first, a noise pattern   appears that slowly morphs into  exactly what we are looking for. This is what we mean when we say we are  talking about diffusion-based models. Now, OpenAI’s DALL-E 2 can create  incredibly creative images,   and Google’s Parti and Imagen AIs  are also at the very least as good.   Sometimes they even win linguistic  battles against OpenAI’s solution. But there is a problem. All of them are missing  something. And that is the model weights,   and the source code. This means  that these are all closed solutions,   and we cannot pop the hood and look around inside. But now, here is a new solution called  Stable Diffusion, where the model weights and   the full source code are available. I  cannot overstate how amazing this is.   So, to demonstrate why this is  so amazing, here are two reasons. Reason number one is that with this, we  can finally take out our digital wrench   and tinker with it. For instance, we  can now adjust the internal parameters   in ways that we cannot do with the  closed solutions like DALL-E 2 and   Imagen. So now, let’s have a look together at 10  absolutely amazing examples of what it can do!   After that, I’ll tell you about reason  number two of how it gets even better. One, dreaming. Since the internal parameters  are exposed, we can add small changes to them,   create a bunch of outputs that are similar,  and then finally, stitch these images together   as a video. This is so much better for exploring  ideas, just imagine that sometimes you get an   image that is almost what you are looking for,  but the framing, or the shape of the doggy is   not exactly perfect, well, you won’t need to throw  out these almost perfect solutions anymore. Look   at that. With this, we can make the perfect good  boy so much easier. I absolutely love it. Wow. Two, interpolation. Now hold on to your papers,  because we can even create a beautiful visual   novel like this one by entering a bunch  of prompts, like the ones you see here,   and we don’t go from just one image to the  next one in one jarring jump, but instead,   these images can now be morphed into the next one,  creating these amazing transitions. By the way,   the links to all of these materials  are available in the video description,   including a link to the full version  of the video that you see here. Three, its fantasy imagery are truly something  else. Whether you are looking for landscapes,   I was quite surprised by how competent Stable  Diffusion is at creating those, these tree houses   are amazing too, but that’s not when I fell  off the chair. I fell off the chair when I saw   these realistic fairy princesses. I  did not expect it to be able to create   such amazingly realistic humans. How cool is that! Four, we can also create a collage.  Here, we can take a canvas,   enter several prompts, and select a range for  each of them. Now the issue is that there is   space between the images, and there is another  problem, even if there is no space between them,   they won’t blend into each other.  No matter, Stable Diffusion can also   perform image inpainting, which means  that we select a region, delete it,   and it will be filled in with information  based on its surroundings. And the results are   spectacular. We don’t get many separate  images, we get one coherent image instead. Five, you know what, let’s look at a few more  fantasy examples. Here are some of my favorites. Six, now these are diffusion-based models, which  means that they start out from a bunch of noise,   and slowly adjust the pixels of this  image to resemble our input text prompts   a little more. Hence, they are very sensitive to  the initial noise patterns that we start out from.   Andrej Karpathy found an amazing way to take  advantage of this property by adjusting this   noise, but just a tiny bit, and create many  new, similar images. When stitched together,   it results in a hypnotic video like this one.  Random noise walks if you will. Loving it! Seven, it can generate not only images, but  with a little additional work, even animations.   You are going to love this one. Look. This was  made by creating the same image with the eyes open   and closed, and with a little additional  work of blending them together,   it looks like this. Once again, the  links to all of these works are available   in the video description if you wish  to have a closer look at the process. Eight, you remember that it  can create fantastic portraits,   and it can interpolate between them. Now,  putting it all together, it can create portraits,   and interpolate between them, creating these  sometimes smooth, sometimes a little jumpy videos. And don’t forget, nine, variant generation   is still possible. We can  still give it an input image,   and since it understands what this image depicts,  it can also repaint it in different variations. And finally, ten. The fact that these amazing  images come out of Stable Diffusion does not mean   that we have to use them in their entirety. If  there is just one part of an image that we like,   be it the knight on a horse, or the castle,  that is more than enough. We can discard   the rest of the image and just use the parts  that we love best, and make an awesome montage   out of it. I would say that here, very few  humans would be able to tell the trick. Now, we discussed that we can now  pop the hood and tinker with this AI,   that was one of the amazing reasons behind  these results, but I promised two reasons   why this is so good. So what is reason  number two? Is it possible that? Yes!   This is the moment you have been waiting for.  You can now try it yourself. If you are patient,   you can engage in changing the internal  parameters here and get some amazing variants. You might have to wait for a bit, but as of the  making of this video, it works. Now what happens   when you Fellow Scholars get over there, who  really knows, we have crashed plenty of websites   before with our Scholarly Stampede. And, if you  don’t want to wait or run some more advanced   experiments, you can run the model yourself at  home on a consumer graphics card. Loving it. And if you are unable to try it, don’t despair,   AI-based image generation is only getting cheaper  and more democratized from here on out. So,   a little open-source competition for  OpenAI and Google. What a time to be alive! And, please, as always, whatever you do, do  not forget to apply the First Law of Papers,   which says that research is a  process. Do not look at where we are,   look at where we will be two more papers  down the line. For instance, here are some   results from DALL-E 1, and just a year later,  DALL-E 2 was capable of this. Just a year   later. That is unbelievable. Just imagine  what we will be able to do 5 years from now.   If you have some ideas, make sure  to leave a comment about that below. So, finally Stable Diffusion, a free and open  source solution for AI-based image generation.   Double thumbs up. This is something  for everyone out there and it really   shows the power of collaboration as us,  tinkerers around the world work together   to make something amazing. I  love it. Thank you so much! And note that all this took about 600 thousand  dollars to train. Now, make no mistake, that is a   lot of dollars, but, this also means that creating  an AI like this does not cost tens of millions of   dollars anymore, and the team at Stability AI is  already working on a smaller and cheaper model   than this. So, we are now entering not only  the age of AI-based image generation, but the   age of free and open AI-based image generation.  Oh yes! And for now, let the experiments begin! Thanks for watching and for your generous  support, and I'll see you next time!
Info
Channel: Two Minute Papers
Views: 625,830
Rating: undefined out of 5
Keywords: ai, stable diffusion, dall-e, openai dall-e, openai dalle, dalle 2, google imagen, dall-e free, open source dall-e
Id: nVhmFski3vg
Channel Id: undefined
Length: 10min 59sec (659 seconds)
Published: Tue Sep 06 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.