Stable Diffusion 3 Takes On Midjourney & DALL-E 3

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

stable diffusion 3 really here I was having a pretty chill week it was calm it was relaxed and then I started to do a video on stable Cascade I thought that was the most Cutting Edge thing that was going to be announced this week and then right when I'm in the middle of filming that stable diffusion 3 gets dropped from stability AI let's jump in and look at it and here it is in all of its Glory now this isn't accessible to the public just yet it's still just a preview they're posting some teaser shots all over the place and here's essentially what it says announcing stable diffusion 3 and early preview our most capable text to image model with greatly improved performance and multi subject prompts image quality and spelling abilities and you can see right on these images on the homepage they're really pushing that text creation ability which has been something that's difficult for models since the beginning of stable diffusion but the real big news here is that multi-ub prompt adherence that's really what matters here think about it from an artist perspective or somebody trying to actually do a creative job using these tools if you're not able to specifically Say Hey I want a red apple it's got to be on the right hand corner of the desk I want a wooden desk with a chair that's pushed in and I want there to be a plant on the left-and side of the desk just next to a bottle of red wine if you can't actually suggest all of those pieces and then get them coherently placed within an image then the tools really aren't anything more than something to create pretty pictures with and some internet meat and up until now kind of the state of the art of this has been Dolly 3 Dolly 3 because it's built off of this Transformer model it has all the information of an underlying large language model Chad GPT it's able to really follow those text prompts nicely and come out with some really high quality images so it's one of the big things that stability AI is really pushing here and they claim that stable diffusion 3 is outperforming everything else that's come before it we're going to take a look at some more example images we're going to drop those into Dolly 3 stable Fusion XL may even try stable Cascade since I've got a working version of that so here we go on the left hand side of the screen we've got the stability AI images on the right hand side of the screen we've got pixel dojo this is my personal project with pixel Dojo you can jump on in here and you can actually use all these different models all in one place so you can run stable diff Fusion using stability AI sdxl Juggernaut XL anime XL even sdxl lightning which just came out from the team over it Tik Tok on top of that you can chat with large language models you can even generate Dolly 3 images even stap diffusion video in this case we're going to stick to stable diffusion and let's take a look at just sdxl specifically so the prompt for this image is epic anime artwork of a wizard at top of mountain at night casting a cosmic spell in the dark sky that says stable diffusion 3 made out of colorful energy let's drop that into sdxl see what comes back and this is what we get it's a beautiful image it's high quality it looks really nice I like the kind of cosmic waves coming out of the top of it but it doesn't adhere to The Prompt it says a wizard on top of a mountain it follows that part night sky casting a cosmic spell into the dark sky that says stable diffusion 3 made out of colorful energy and you can see that last part it just completely falls off it doesn't generate anything like that let's see if Dolly 3 fares any better and even Dolly 3 still a cool image looks you can see the mountains the wizard that dark energy coming out of there but it almost looks like an S but you don't get any words it's again not able to generate that stable diffusion 3 the actual words out of the energy in the sky now let's try this in the other latest model from stability AI stable Cascade and if you want to know how to install stable Cascade it's fairly difficult and timec consuming unless you're one of my patreon subscribers I've actually got a oneclick installer and a oneclick launcher that you can check out right over here I'll have a link in the description for that and here's what that'll look like it's running on gradio and when you click run let's see what it comes back with okay not too bad this came back with a wizard on top of a mountain you've got that energy in the sky and it actually says stable diffusion it doesn't say stable diffusion 3 and it's not doing it out of the energy out of the top but this is at least the closest of all the systems we've tested and for the next one this one's particularly challenging think about all that's happening in this pictures so the prompt is three transparent glass bottles on a wooden table the one on the left has red liquid and the number one the one on the middle has blue liquid and the number two and the one on the right has green liquid and the number three there's a lot going on here and you have to understand spatial awareness and positioning liquid colors all that stuff really have to pay attention to the prompt in order to get this right so let's first try this in sdxl and it gets it completely wrong you kind of get numbers on the bottle sort of it's like an upside down eye a one sort of a 3ish the colors are all wrong about the only thing I got right is that it's glass bottles on a table but everything else it sort of mashed up will Dolly 3 Faire any better let's check it out okay Dolly this is really really good I love the Frosted look of the glass really aesthetically pleasing here so you've got the wooden table that looks good you've got the one on the red two with the blue and three with a green liquid I would say Dolly kind of nailed this pretty impressive let's check out a stable Cascade see how it fars much better job on the text on the bottles on the liquids but still the wrong order the wrong numbers on the wrong bottles says 323 it's got the red on the right instead of the blue in the middle it's completely wrong but a little bit better at least than the result from sdxl on the text generation front The Prompt for this one is a horse balancing on top of a colorful ball in a field with green grass and a mountain in the background stable diffusion 3 comes back with exactly that see what Dolly comes back with well that's interesting so it's a green field it's a bunch of balls so it looks like it's it is colorful so I'll say this it sort of mashed up a bunch of individual balls colorfully and then you got a horse really awkwardly sort of standing on top of it or not really standing but just sort of sprawled out on top of it mountains in the background overall pretty good adherence to The Prompt but definitely a sort of weird result for sdxl not even close the adherence to The Prompt really isn't there you've got sort of this ball that's I don't know is that it looks like almost the shape of a continent on it it's colorful I'll give it that and then the horse looks like it's sort of behind it sort of on top of it you've got a mountain and a green field but it seems like stable diffusion XL sort of just falls off after the first couple words in a prompt and really loses it from there the stable Cascade somewhat between Dolly and stable diffusion XL you've got a colorful ball you've got a green field you've got a mountain in the background the horse just sort of precariously floating in the sky which you know sort of isn't possible this has to be one of my favorites because it's so specific it's a weird image don't get me wrong but it's very specific The Prompt is a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella on the ground next to the pig is a robin bird wearing a top hat in the corner of the words stable diffusion and you see it nailed all of them now obviously we don't know if these images are cherry-picked I assume to a degree that they are but the fact that it nailed every single element of the picture really impressive let's see how close we get on sdxl I'm not holding out hope that it's anything more than maybe a picture with maybe an astronaut and sort of the ground underneath them let's see man this is super weird Okay so we've got this sort of astronaut bird there is an umbrella so I'll give it that it is riding on a pig and I will say the pig looks much cooler than the one in the left image now this bird down here I don't know what the deal is with that it's like another Pig bird and then something going on over here there's no words there's no top hat it's really missing a lot of the other elements just like we've seen in the other pictures now have higher hopes or Dolly because it's generally better at picking up all those references in a prompt let's see what it comes back with for this okay this is pretty interesting so it's not quite what I expected you've got a pig wonderful looking Pig you've got an astronaut riding on top of it she is wearing a pink tutu she's holding an umbrella although the umbrella looks like it's going through her hair which would be impossible with the space suit and helmet on you've got this bird floating on a stick it looks like but then you do have another bird down here with a top hat on so you can see it picked up most of the prompt pieces which is pretty good it did miss the text so you don't get stable diffusion anywhere on the image but I've got to say this is the closest to the quality of stable diffusion 3 and for stable Cascade all right not as bad as some of the others it's got a pig he's got an astronaut wearing a tutu there's an umbrella there's no bird with a top hat but it does have stable in the bottom so it's sort of picked up that you should put text somewhere in the corner is not quite fully coherent so just for fun let's put this into mid Journey V6 mid journey is really good at adhering to a prompt but it also has really high Aesthetics it's going to probably be the best looking image of the group we'll see if it's as high quality as far as the adherence and here are the four images you can see it has most of the aspects so this one down in the bottom left has a top hat on a bird it's got the pig the tutu on sort of on top of or underneath the astronaut it's holding the umbrella and it's says stable diffusion that one almost gets it although he's riding backwards so it doesn't understand sort of how things should be placed completely and the other one on the right also riding backwards but it did get the bird in the top hat the tutu astronaut the umbrella stable diffusion at the top although it said in a corner so that's not quite right all of them have writing though and it looks like most of the elements so i' got to say mid Journey looks really good I love the artwork the overall aesthetic of this image is is again not quite as good at following the prompt as stable diffusion 3 and as I mentioned this isn't available to everyone yet you can join the weight list over here on stability AI news stable diffusion 3 I'll have a link down so you can do that you've got some other examples of images here a lot of them again just sort of look at the text generation abilities and its ability to sort of follow those prompts now the stable def Fusion 3 Suite of models so this is telling me that there's going to be not one but multiple multiple models that come out of this range from 800 million to 8 billion parameters there are going to be some monstrous models out of this 8 billion parameters is quite a lot this approach aims to align with our core values and democratize access providing users with a variety of options for scalability and quality to best meet their creative needs stable diffusion 3 combines diffusion Transformer architecture that's a new type of architecture for these s Fusion models that more aligns with what you've seen from Sora and the open AI team for their new video generation it also has flow matching flow matching is a little bit different than what's been done before if you've watched my video on how stable diffusion works you know that it works in steps you go from a completely static image and you step through it iteratively one piece at a time until you reach that final image flow matching sort of Skips a lot of that and it just directionally sort of flows through the process skipping that individual step and getting to what seems like is a higher quality result it's also a lot faster and more efficient to train which is going to be a big deal for the fine tuning Community or the aftermarket of stable diffusion yes I'll say today that stable diffusion 3 isn't quite on par with say Dolly 3 as far as the aesthetic ability or mid Journey V6 but it's going to be an open model it's going to be open source you can download it you'll be able to fine-tune it train it build luras on top of it all the amazing stuff that we've to see from the open source community and if you saw some of the debacle with Google imageen over the last few days you'll see why that's so important these companies that are in charge of these big models have the responsibility of keeping them as open and usable as possible Google really dropped the ball there and that's why open source is so vital and important to the community at the end of the day I don't want an AI model that's trying to align me I want an AI model that I can use freely open openly and in an uncensored way so thank you so much stability AI for putting these models out there and making them accessible to people like us and as soon as stable diffusion 3 is accessible to the broader audience I'm going to put it up on Pixel doo. so be sure to check that out and if you want to check out pixel Dojo AI YouTube 50 is a discount code you can use 50% off your first month thank you so much to all of my supporters and subscribers and if you haven't subscribed yet go ahead and do that now hit that like button that really helps me out as always I'm Brian love it and remember all your Tech are belong to us we'll see you next time I'm the virtual profit in the breaking down aing the crown from Basics to complex never let you down all your tax ain't I earning the renown

Info

Channel: All Your Tech AI

Views: 4,687

Rating: undefined out of 5

Keywords: dall-e 3, ai art, stable diffusion, dalle 3, midjourney v6, midjourney version 6, midjourney ai, ai news, stability ai, stable diffusion 3.0

Id: lChN2fMs5H8

Channel Id: undefined

Length: 13min 50sec (830 seconds)

Published: Fri Feb 23 2024