Stable Diffusion 3 - An Amazing AI For Free!

Video Statistics and Information

Captions Word Cloud
Reddit Comments
stable diffusion 3 is a text to image AI where you write a short prompt and you get these beautiful images and it is or will soon be a completely open technique that is free for all of us to use amazing but it gets better oh yes the paper is now available I was lucky to have had access to it a little earlier than most so here is a deeper look at some of the new results and we will talk about how all this Wizardry is even possible dear fellow Scholars this is two-minute papers with Dr now results three things immediately caught my eye in the paper one text I am always interested in creating images with text with stable diffusion but I was never able to do it well enough here are some of my results from stable diffusion Xcel a previous version from just a a few months ago and the results were mixed to say the least I had to try several times for each text example and at least half of them did not work at all and even that other half were passable at best and now let's have a look together at the new technique oh my goodness it not only works but this is fine of course as soon as we get access we will see how much cherry picking this requires if any but this looks quite reassuring and it not only seems to work more reliably but it even supports different styles of text loving it two the creativity is incredible here are some of my favorites this is human life depicted out of fractals self similar mathematical structures and this was done extremely well and a kaleidoscopic bird this one is an absolute Beauty I love how colorfully it pops out from the background and this translucent pig that has another Pig inside it and it just goes on and on and on truly excellent and we get all this for free wow now three the quality of these images is also remarkable for instance I did not put this in the creativity but in the quality category because I think it scores much better here I love how the jam is dripping into the water not mixing with it and as I am a light transport simulation researcher by trade I cannot help but notice how beautiful the reflections are on the water my goodness and then this one which showcases the Third Law of papers Yes you heard it right you heard the first law many times but there are more laws four to be exact the third law says that research is a study of failure a bad researcher fails 100% of the time while a good one only fails 99% of the time hence what you see here is always just 1% of the work that was done this showcases how much work writing such a paper takes for a group of scientists very imaginative and and it is a little ironic too love it but how is this even possible what does the new technique do to get these incredible results well this is still a diffusion based AI technique that looked at a lot of images and generates new ones by starting out from noise and over time reorganizing this noise into an image that you desire now a few things that caught my eye in the paper one is a technique called direct preference optimization if this AI model were a car this step is essentially a way to fine-tune it to get closer to the preferences people typically have when driving a car think about a smooth padal response and for instance a soft suspension so does it work now hold on to your papers fellow scholar and if you feel ready to look at the mangled hand that is coming you are not ready but if you think you are here you go oh my goodness and the text isn't great at all and with this new stamp lovely in their user study scientists found that humans like the new version a great deal better and it also helps us get spelling a little more reliably well I can't stop thinking about that hand and I have to agree on this one being better but there is more to it oh yes rectified flows now imagine taking out our fine-tuned car for a ride but on Old roads that have lots of twists and turns and the new one is a straight path through the mountains and rectified flows give us a much better new road that goes straight through the mountains in other words it is more sample efficient which means that if we give it the same amount of computation time it gets us higher quality results and all the results that you see here are using the 8 billion parameter Network so many of you will be able to run this on your laptops or use one of the cloud providers to do that and there will also be a lighter version that might even run on your phone and all this was lots and lots of work and we get the fruits of this work for free results code and model weights all freely available now or soon absolutely amazing thank you so much what a time to be alive and we still have a deeper look at the Gemini 1.5 Pro AI assistant and its free and open model variant Gemma in the works subscribe if you wish to hear about those experiment tracking model evaluation and production monitoring for your deep learning projects and llm apps this is what weights and bias does and it is the best everyone is using it try it out now at wb. me/ papers or click the link in the description below
Channel: Two Minute Papers
Views: 81,069
Rating: undefined out of 5
Keywords: ai, stable diffusion, stable diffusion 3
Channel Id: undefined
Length: 6min 41sec (401 seconds)
Published: Tue Mar 05 2024
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.