Stable Diffusion 3 API Released.

Video Statistics and Information

Captions Word Cloud
Reddit Comments
stability AI has been a key player in the generative AI game for quite some time compared to their closed Source competitors like Dolly and mid Journey stable fusion and stability AI has been kept open source which has been great for Community not only is stable Fusion op Source but it has been the most professional tool compared to the competitors just consider all the features that we have in stable Fusion like control Nets face whopping abilities you know stuff like that today marks a new era and stabil Fusion 3 is now available AI they say here on their Twitter today we're pleased to announce the availability of stable Fusion 3 and stable Fusion 3 Turbo on the stability AI developer platform API we have partnered with fireworks AI the fastest and most reliable API platform in the market to deliver these models so what does this mean I've had access to stap Fusion 3 for a few weeks now I've been testing it out seeing how it feels I'll show you some quick examples in a bit but basically so far stable Fusion 3 has been very limited and not been available to uh a lot of people now that has changed so basically anyone can use it through the API in the example in Twitter here they have a prompt that says awesome artwork of a wizard on the top of a mountain he's creating the big text stable Fusion 3 API this is the one you see here with magic magic text at dawn Sunrise so if you haven't seen my previous video on stabilization 3 uh what you can expect is better prompt understanding and the ability to uh prompt for text as you can see in some of the examples that they put up on Twitter which are you know obviously Cherry Picked but for a base model they are very good and this one it says prompt the red sofa on top of a white building graffiti with the text the best view in the city so it says here the best view in the city it's uh spelled correct we got a couple of extra dots down there which I think is fine and the example here the red sofa is actually on top of the white building and you know they probably have the best view in the city what off well we don't know we can see some of the back background here but uh that's about it another prompt here is Portrait photograph of of an anthropomorphic PTO Seated on a New York City subway train so you know this is what turtles uh would have looked like if uh was kind of half semi real I wonder which one it is Leonardo Donell Rafael oh doesn't matter it's one of them can't be anyone else they keep on showing here just you know more aesthetic pastel magical realism a man with a retro TV for aead standing in the center of the desert vintage photo and what they're trying to show here is you know basically the prompt understanding it's actually having a man with a TV for a head is surprisingly hard for the the previous models so you can put a lot of stuff in there which I think is very cool when to said in a lot of stuff I mean you know a lot of text in the prompt but also a lot of stuff in the images so you can say I mean compared to when the new dolly was released it was like yeah you can you know have this character there C this character there you know holding this and you know being in this background and this setting whatever and H St Fusion 3 is actually doing sort of this the same thing and this example is say a cardboard box with a face they say it's not good to think in here here which is a quite long sentence shorter words though the cardboard box is large and sits on a theater stage and I think you know that's pretty good they got some key takeaways here so stable Fusion 3 and stable Fusion 3 Turbo are now available on the stability AI developer platform API we talked about that a little bit we have partnered with fireworks AI the fastest most relable API platform in the market y yada they said the same on Twitter but there's some more information here which is not available on Twitter and um it goes on to say here as revealed in the stable Fusion through research paper this model is equal to or outperformance state-of-the-art text image generation systems such as Dolly 3 and M Journey V6 in typography and prompt adherence based on human preference evaluation so human preference evaluations is it is basically you know uh you generate let's say four images and then uh someone goes in and says okay this was the best one uh and clicks you know kind of a voting system right uh and then you know they get for New Image and and again oh this is the best one uh so that's what they mean with um human preference evaluations they basically you know they're blind tested well hopefully they're blind tested if it's not blind tested then you know that's bias new multimodal diffusion transform uses separate set of weights for images and language representation so that improves you know text understanding and spelling capabilities compared to previous versions of stable fusion and if you've been using stable Fusion for quite some time now if you've been with me through all of this journey from day one you know that stab fion can't spell uh I mean it is what it is we have to F we have found creative ways around that especially with you know controlling here we got another set of examples of red sofas so in this one we have a red sofa in the middle of a beautiful garden wooden sign with the text the best view in the garden in Pasto painting colorful flowers hey what do you call a fake noodle an impasta so we can see this one again and then we have another one the best view at home this one is an Embroidery in colorful flowers which I I think looks pretty cool so I did some testing of my own I have um this image for example which is just a big Quick Test I prompted for like a neon cyber Punk City street uh with AI dad so I got AI Dad here I should probably get like AI dad on on the cap or something I did check you know trying to uh test some of the the skin capabilities because a lot of the time with many moldes especially uh sdxl when it was released you get this overcooked kind of overbaked result you know glossy plasticky stuff uh in this example uh we got a pretty good pretty good skin fairly realistic it's a little it's a little too much but it's a base model so I would say that uh we're getting there and it's looking pretty good pretty good segment on safety here and this says we believe in safe responsible a practices this means we have taken and continue to take reasonable steps to prevent the misuse of stable Fusion 3 by Bad actors safety starts when we begin training our model and continues throughout the testing evaluation and deployment by continually collaborating with researchers experts and our community we expect to innovate further with Integrity as we continue to improve the model so I mean it's a lot of words it doesn't say specifically what they're doing but I I mean I guess that's expected you can't reveal all all of the work but just having a paragraph that says uh you know they're working on it it being open source and uncensored um you know I understand the concern it's a hard topic so again to reiterate guys this you cannot download and use locally it's only available through apis you need to use uh separate tools and platforms for it uh but something interesting in to say here while the model is available via APA today as part of its initial launch we are continuously working to improve the model in advance of its open release you can anticipate seeing these improvements in the upcoming weeks so we can expect to see an updated version before uh the weights are released so you know that's kind of cool but for now it's going to be able through API and they're working with fireworks so um yeah what do you think is going to be an improvement over 1.5 and sdxl I think it kind of will especially when we get some fine tune molds that are trained by the you guys in the community cuz you are doing great work thanks for watching I'll see you in the next one see you yeah
Channel: Sebastian Kamph
Views: 20,690
Rating: undefined out of 5
Id: JdGlhFCuYD8
Channel Id: undefined
Length: 8min 1sec (481 seconds)
Published: Thu Apr 18 2024
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.