Stable Diffusion 3 EXPLAINED + Compared VS Midjourney V6 VS DALL•E 3

Video Statistics and Information

Captions Word Cloud
Reddit Comments
the latest version of stable diffusion is imminent and stable diffusion 3 promises us more than ever before we can expect higher quality images better spelling capabilities and the ability to understand complex relational prompts I'm going to take you through everything you need to know about this latest release and we're also going to compare the images coming out of stable diffusion 3 with the existing best-in-class AI art generators mid journey and dar 3 so buckle up ladies and gentlemen and let's dive into the latest news in the AI art world so one of the most interesting features coming out of St diffusion 3 is its enhanced subject prompting ability and this is how stable diffusion understands and interprets complex prompts with objects that are relating to each other in complex and dynamic ways for example emed MC who is the CEO of stability AI the company behind stable Fusion tweeted this image of a photo of a red sphere on top of a blue cube behind them is a green triangle on the right is a dog on the left is a cat and you can see that here it has a number of very complex elements and each of these has been iterated and generated with exact Perfection and prompt adherence now this is extremely important if you are looking to create complex scenes and tell stories inside of your image so if you were looking for an image of a Caucasian male Center screen with a microphone in front of his face and a green pant above his right shoulder with a gray concrete rustic background then stable diffusion 3 would be able to handle this prompt exactly now just to show how much of a step forward this is this same prompt was entered into sdxl and darly and you can see that it failed that both of these a generators could not compare to the abilities of stable diffusion 3 on this multi prompt task now we can take a look at a few more images coming out of stable diffusion 3 and here we can see the ability to generate diverse sets of images and you can see this candid photography Style with a blurred background that even incorporates text onto the Blackboard we can take a look at more surreal pieces of art being generated by stable diffusion 3 and you're going to notice that the composition and overall athletic looks to have taken a huge step forward here we have a photorealistic chameleon and also a more abstract artwork now I have to say that this piece on the left here is more reminiscent of what we are used to expecting from stable diffusion you can see that it's quite oversaturated and garish now stability AI has a announced that they are opening the wait list for early preview so this means that at the moment it's not fully available for everyone to use but they're going through a testing phase before they go on to a general public release they say that this phase is crucial for Gathering insights to improve its performance and safety ahead of open release and you're able to sign up for the wait list if you want Early Access by clicking on this link and you are simply taken to a form that allows you to submit to the weit list now what's really exciting in stable diffusion 3 is the enhanced text generation capabilities and we're seeing some beautiful pieces of typography being generated inside of stable diffusion for example here we have a graffiti style sign with the text when sd3 and you can see that it's both realistic and coherent and the spelling is perfect there are no mistakes at all and here you can see that the text has been generated inside of these different images here's one with Kermit the Frog hi ho comit the Frog here good night folks that's my best come at the Frog impression Pikachu good night here we can see an example of a photorealistic image being generated inside of sa Fusion now I particularly admire the should we say the wet ink typography that has been generated inside of the sign that she's holding up I really admire the realistic Strokes of the text and these Loose Ends to each of the characters giving them a very handdrawn feel now you can see here some really diverse typographic Styles coming out of stable diffusion 3 I particularly admire this collage of different magazine elements compiled together in this composite giving us stable diffusion 3 in this large sanserif type face now the ability to generate text inside of stable diffusion gives us a whole host of possibilities including creating logos developing signage and creating some beautiful typographic quotes so I was able to generate my own fonts inside of mid journey and actually build these and build them using entire character sets here is one that I built called Nogle and here is another one a more handrawn hand painted Effect called Backus and a bubble font called plop and I was able to generate entire character sets for the benefit of creating uh beautiful fonts which are then able to turn into usable fonts and sell as digital product so if you're interested in learning that process do make sure to check out the course in the description below now one issue with mid Journey's text generation capabilities was that it would not always perfectly spell exactly what you asked it to it would roughly get about 80% of the characters correct however on closer inspection of stable diffusion you can see that in all the examples I have seen so far it has 100% accurately attained the given input here you can see they asked for a watermark of stable diffusion in the bottom corner and it's spelled stable diffusion correctly and again here on the hero image they have presented of the tool now Andre who is the media lead at stability AI has been posting a a lot more previews of stable diffusion 3 and he shown off a few very exciting capabilities now it's interesting that he's based in Guana I had to search where this is it's actually a city in Brazil in case you interested after stable Fusion 3 comes out there are a number of features that we are expecting to be added in and some of these are pretty exciting first of all we'll have the ability to update and iterate on our images by selecting parts and in painting them for example changing a plate of food on this little cat or changing the entire animal to a raccoon you can also easily add or remove other elements and add video and this is where stability is heading next now Imad MC has also tweeted that he's looking to make an open-source version of Sora but he needs a lot more computing power to complete the training so it looks like there's some really exciting things coming out of stability AI in the next few weeks and months now here's another example of the improved composition collaboration and iteration of stable diffusion and you can see the ability to iterate and change elements of the image before turning it into an animated video so we're still waiting to get our hands on stable diffusion but what we've been able to do is take some of the prompts they've been using and put them into other AI generators and I'm going to show you how they compare so first of all we're going to use the prompt Studio photograph closeup of a chameleon over a black background so this was the stable diffusion version and you can see here we're looking at a photorealistic image of a reptile and here is the mid Journey version and you can see here that I would say for me there's something about the colors that seem slightly more aesthetic uh any more pleasing to the eye though you might even say that the stable diffusion version is more lifelike more realistic and slightly crisper but if we look at some of the details here you can see that they are quite comparable with the amount of detail in each image I would have to say that some of the uh lighting seems slightly more realistic on the reflective scales of the chameleon inside of mid Journey now if we also look at the DAR version you you can see it has a very dares filter to it there I think it has a very high dynamic range which is slightly exaggerated the Shadows are very dark and highlights very light and it always seems to be almost a a vignette applied to all darly images that it means that the corner is always extremely dark and that the center is very light and this seems to happen on all of the images you get out of Dar so there's some real telltale signs from Dary I will also say that it's slightly less realistic than the other two now it's very interesting how similar the composition is of each let's look at them all together and we can really pick out some of the differences of each of the AI art generators looking at the strengths and weaknesses of each so I would say that St diffusion comes out with the most photorealistic version of a chameleon I would say that mid Journey produces the most aesthetic and I would say that darly has the most stylized it's interesting that the color schemes coming out of mid journey and Dary are more similar they certainly have picked up on a a very consistent turquoise teal and orange as the dominant colors yeah there's almost a little bit too much harsh lighting coming out of the DAR version and if we just look at the prompt again closeup of a chameleon you can see that darly is perhaps not gone with so much of a closeup compared to the other two okay moving on to the next prompt which is a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella on the ground next to the pig is a robin bird wearing a top hat in the corner of the words stable diffusion so here we have a complex surreal prompt with many interrelational objects that are being asked to be put in very specific places within the image now these elements are being put both in specific places such as in the corner and in relational spaces which means in relation to other objects for example on the ground next to to the pig so let's take a look at how stable diffusion does you can see here that they have completed the image perfectly and all of the elements are in the correct place it's a quite a pop art style very bold garish that has come out of St diffusion there is an odd object appearing inside the umbrella here don't know what that is and here in the mid Journey version you can see that it has not been able ble to adhere as well as the stable diffusion version to the relational prompt you can see that the main issue is that the astronaut is not holding the umbrella and that the pig is to the right of the robin there is also a strange tattoo on the pig's bottom but it's very interesting to note the very different styles that are generated from these two based on exactly the same prompt with no style prompting added in so you can notice here there is this almost painted Illustrated image compared to this very should we say coarse saturated collage now we'll take a look at the darly version and here you can see that it has adhered to The Prompt the only thing that I see that it has messed up is that it has spelled diffusion wrong apart from that we have the little Robin wearing a top hat in the right place we have a pig wearing a tuto I have to say the pig is wearing the tuto uh more realistically than the first Pig in St diffusion I'm not sure if a pig would wear a tutu would it just balance one on its back like this so I would say actually that Dar has performed exceptionally well in this relational prompt challenge it's also created a almost a 196 1960s advertising style image with this quite airbrushed soft Focus oil painting I would say that the stance and positioning of the Robin is a little bit unusual as well as its face being rendered slightly incorrectly that there is no clear connection between the the beak and the eyes of this little Robin I do like how the pig is looking at the robin though if we compare all three of these together is exploring how the different styles are implemented that this was a prompt without any stylistic interpretation and overall I would have to say that my personal tastes are most drawn to Mid journey and Then followed by St diffusion and finally with darly but if we're going on prompt adherence you would have to say that it would be stable diffusion darly mid Journey if we're looking at coherence should we say shall we discuss the realism or the ability for the AR art generator to create an image that matches my expectations of reality I would say that mid journey is number one one there whereas in St diffusion the issue is that the pig is not exactly wearing the tutu correctly and the issue with Dary is that it has generated a little tiny Robin with a rather messed up face now moving on to the third Challenge and the prompt for this one was epic anime artwork of a wizard at top a mountain at night casting a cosmic spell into the dark sky that says stable diffusion 3 made out of color energy so the stable diffusion version came out with well defined image we have the words spelled correctly and everything is coherent I would say that the detail is quite low it's not necessarily obvious that this is a painting it certainly has a slightly anime feel to me now comparing this to the mid Journey version you can see that it had some problem rendering the exact text that we asked for but the image is aesthetic and it has understood the majority of our prompt finally we have the DAR version where it's completely messed up and it hasn't got the text in correctly at all it's once again got this quite High dynamic range intense saturation and contrast typical of Dar Generations however I do like the bluing of the Wizard's gwn so let's take a look at how these all look together we can see that overall stable diffusion has created the most prompt adherent piece beyond that I would say that all three of them have actually created perhaps not the most aesthetic or beautiful pieces it's interesting to note that stable diffusion and dar have gone with very similar compositions whereas MJ has tried something a little bit different with this off-center focal point leading us from left to right as for a painted style I would have to say that dolly is least adherent to this part of the prompt and the other two are performing much better in reflecting that aspect so I would say that mid Journey still looks like it has the overall greatest taste out of the three that you'll have to really work to get out something that matches a Keen Eye of stylistic intent but I'm excited to try this out firsthand and of course sa diffusion will have the advantage of being entirely open source and I'm curious which do you like best how do you rate the strengths and weaknesses of each of these different AI art generators do let me know in the comments I always enjoy reading the discussion thanks for watching I appreciate you being here and I'm wishing you a delightful day
Channel: AI Samson
Views: 14,772
Rating: undefined out of 5
Keywords: ai, samson vowles, stable diffusion 3
Id: AVV9jm2E9lI
Channel Id: undefined
Length: 16min 41sec (1001 seconds)
Published: Wed Feb 28 2024
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.