Stable Diffusion 3 - Amazing AI Tool for Free!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

stability AI is bringing one of the most powerful text to image AI generation tools for free stability AI is pushing out a new update to stable diffusion called stable diffusion 3 and this is probably one of the most exciting things going on in open- Source AI right now I'm super excited I know you are too all right let's go if you're unfamiliar with stable diffusion it's a text to image generation model which is a available for free and most online text to image generation tools are actually running stable diffusion in the background with tools like this users are able to create all sorts of amazing stuff just based on text prompts and stable diffusion 3 is a huge upgrade from the previous stable diffusion 2 today we're going to be diving into stable diffusion 3 that way you can see what to expect and determine whether or not this is something that's going to be useful for you so stable diffusion 3 is not just another step in the AI evolution ution it's more like a giant leap with its unparalleled ability to interpret multi-ub prompts and spell out entire imaginations into visuals it really pushes the boundaries of what we thought was possible with what's now able to be generated in a few seconds allowing it to understand much more and what's even more impressive about it is that they're coming out with this multimodal diffusion Transformer which is a brand new architecture that actually uses separate weights for image and language representations which is specifically going to useful for improving text understanding and spelling capabilities as compared to previous versions so if you've ever tried to generate images in the past using something like stable diffusion you probably know that most of the time whenever text is inserted into an image whether that's in a logo or something written on paper the text tends to come out all sorts of wonky it doesn't tend to make any sense as you can see here in this example like I don't even know what this text is saying right now but what's really impressive here is that this image that you're looking at right now was actually created using stable diffusion 3 and the text is so legible it looks like it was even created by some graphic designer granted it's not the most prettiest design but it does look legible and it is actually properly spelled what's really cool too is that we have differences in the text style as well so we have a bit more of this playful brush stroke style whereas we have something a little bit more concrete and stable over here for the stable diffusion 3 this is amazing and really really impressive as you can see from the performance not only does it perform better just overall from the visual Aesthetics and prompt following typography but it completely outpaces what's impressive with this is that it actually comes with models ranging from 800 million parameters to 8 billion parameters which is a huge range and should hopefully allow desktops with much lower end specs to be able to run this as well as configurations with much higher end setups now the real beauty of stable diffusion 3 lies inside of its Technical Innovations which is mostly in its architecture this time around this new architecture the multimodal diffusion Transformer is also paired with flow matching I'm not going to get into the full details of how this Tech works but for a simplified explanation this allows the images generated to be much smoother more detailed and more true to whatever the prompt was that you gave it here are some of the examples coming out from stability 3 and what's interesting is that this architecture even though that you can see it being applied to images right now they describe it being extended to multiple modalities such as video so if you've ever seen stable diffusion video before nothing near the results from Sora but I am curious to see how this starts to get implemented into their later text to video generation models here we have a translucent Pig inside of a smaller Pig which is a very specific prompt that creates such an interesting image we also have a massive alien spaceship that's shaped like a pretzel and of course it does incorporate all of these details into the image as we keep going deeper we can see more examples from this and it's really exciting to see just how much progress stability AI has made with stable diffusion 3 I mean take a look at this text we have very refined text encoders here to where we have this burger patty and then we also have the coffee element of the prompt perfectly implemented in here and you can see all of the text whereas beforehand you may not have seen it as well we also have this amazing text here with this monkey holding a sign and even more detail about this mischievous freret if you guys want to see more details about this and learn more about the rectified flow Transformers for high resolution image synthesis we're going to have the research paper down in that description box so you guys can go ahead and check it out for yourself it does get extremely technical so I am warning you on that front now stable diffusion 3 is not out yet but we will be covering it here on the channel as soon as it does there are so many amazing AI tools that are coming out and we actually covered them in this video I think you guys are going to watch to check it out if you're interested in seeing some awesome stuff like cloning your voice live drawing AI image generation and so so much more anyways thanks for watching go ahead check out that video until next time peace

Info

Channel: Black Mixture

Views: 6,491

Rating: undefined out of 5

Keywords: blackmixture, text to 3d, text to 3d ai, ai, 3d ai tutorial, nvidia, generative ai, stable diffussion, artificial intelligence, llm, text to 3d model, blender 3d, text to image, image gen, midjourney, free image gen, free ai image, text2image, 2d ai tutorial, ai image tutorial, ai generation tutorial, ai upscale, ai inpaint, ai change background, fooocus, sdxl, stable diffusion, image generator, sd3, stable diffusion 3, multimodal, transformer architecture, two minute papers

Id: MRUY3lKTjeg

Channel Id: undefined

Length: 5min 12sec (312 seconds)

Published: Fri Mar 08 2024