Stable Diffusion 3 Announced! How can you get it?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
stable Fusion 3 was just announced by stability AI what's the big deal then well I'll tell you prompt understanding text like real proper text and is there anything else well let's check it out oh and what color is the wind blue AI let's just start off with a quick comparison here so here we have a prompt epic anime artwork of a wizard at top a mountain at night casting a cosmic spell into the dark sky of the says stable diffusion 3 made out of colorful energy and the example here is stable defusion 3 this is obviously cherry pick so only have one image to go from but I took the same prompt here and I put it into dolly3 which is the middle one here and mid Journey which is the the one to the right here and I didn't cherry pick this at all I just took the first four Images out of both Dolly and mid journey I also did some some comparisons with sdxl but honestly we don't even need to look at that because we're not getting any text at all the images look fine that's not the issue uh but for this example it's all about the text and in the stable Fusion 3 one here we actually get some pretty good looking text now the A and the B has kind of merged together but it's fine you can see they actually says stable diffusion three in the dolly example here in the middle they're kind of cool uh we are not getting any text recognition at all now dolly is amazing for prompt understanding and most of the time it's pretty good at text but not in this example we're going to look at some examples later where uh Dolly shines a little bit more and in the right example here the mid Journey one the text is I mean you can see what it says and for one of the images here it actually is spelled correctly now in three of them it is not but it's very very close however the text in the mid Journey one isn't really getting the style of the prompt so it's not really casting a cosmic spell into the sky that says stab diffusion 3 in the stable diffusion example it actually becomes a part of the image I'm going to check some more uh comparisons in a bit now if you go to stability AI site they have a News Post basically saying stabil Fusion 3 announcing in early preview our most capable text image model with greatly improved performance in multi-ub prompts image quality and spelling abilities what that is is basically it's going to be able to understand your prompts much much better and be able to get text in there is it going to be much much better image quality I don't believe so at this time but we'll need to compare On's custom train models are out there now looking at the the examples here which are obviously Cherry Picked you can see the text is is well pretty good so we have a text here go big or go home next to this apple here here we have the stabil fusion 3 inside of this paper newspaper clip magazine clip whatever and here we actually have a text on two different parts you have go on the sign here and dream on on the bus and if you look closely it actually says stable Fusion on the side here on the bus and it looks like it's not super clear but it looks like it's spelled correctly looks like one i2f and one s there so that's so far pretty cool now this isn't available for you to use yet however you can sign up for the wait list and you do that by clicking this little thingamajig here which will get you to this sign up form sign up here submit and uh you'll be in the wait list now I talked to a developer about this and we will be seeing a white paper in the coming days after that they're going to start start inviting people to the the preview I know some YouTubers have already said that they have officially gotten uh a confirmation that they've got it in yet I haven't ping emad about that some of us are actually focusing particularly on stable Fusion but in general looking at these images we can't say much because these are examples Here Without Really any prompt stuff like that however if you uh search around on the interet a little bit you can actually find that on Twitter some of the the people of stability AI in this case Andre which is uh working with media in stability has posted images with the prompts so in this one here photo of a 19s desktop computer on a work desk on the computer screen it says welcome on the wall on the background we see beautiful graffiti with the text sd3 very large on the wall so in this chair picked again example it's very good now if you compare this to for example Dolly which is this one here and I'm going to pull up an a mid Journey one which is this one to the right here we can see that in the dolly one here to the test we're getting some welcome on the screen looks very good fairly good uh we are getting the SD text in the wall behind here however it doesn't say sd3 it's an S here this one says sdp3 and the other one on I can't really read at all the same prompt in in mid Journey gives you welcome you get an sd3 on the screen in three of the examples you get an sd3 behind here in some of them uh this one says s D3 or SDI 3 uh so you know it's somewhat getting it but not fully with comparing the prompt understanding just apart from the text I'd say they're currently on a okay level because we're comparing random results from a chair pick result so we'll have to do a proper comparison once we can start generating our ourselves so this is just a rough estimate now next up here we have a prompt that is resting on the kitchen table is an embroidered cloth with a text good night and an embroidered baby tiger next to the cloth there is a lit candle the lighting is dim and dramatic you can see that for both stable Fusion 3 and doly you're getting good text here so there's good prompt recognition regarding the text you can see that it says good night and for two of the images is actually well looks pretty good for Mid Journey we are losing the text in most of the images however we are getting a more cinematic Vibe so just from a visually appealing or aesthetically appealing sense that image you look looks well a little a little more beautiful however from a prompt perspective both stable Fusion 3 here and Dolly kind of wins in that regard if you want to keep browsing there are more images on Twitter check out emat check out Andre here's an example with three transparent glass bottles and a wooden table it's actually understanding that the left one should be red the middle one blue and the green one here is on the right and they're numbered 1 2 3 so that's pretty cool I would love to know what you feel in the comments below but there is more stuff if you just keep checking the Twitter here in this image we have a photo of a red sphere on top of a blue cube behind them is a green triangle on the right is a dog on the left is a cat and that is tremendous prompt understanding really good if if I say so myself thanks for watching see you
Info
Channel: Sebastian Kamph
Views: 35,978
Rating: undefined out of 5
Keywords:
Id: 6vMq4yEwoDQ
Channel Id: undefined
Length: 7min 56sec (476 seconds)
Published: Sat Feb 24 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.