SDXL 1.0 Takes On Midjourney

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
stable diffusion XL 1.0 has been released does it live up to the height let's check it out now I've been messing around with this ever since it was released in fact I installed the 0.9 release because I wanted to check it out as early as possible the big things that have changed here is the base model is trained on 1024 by 1024 images this means that by default you're able to create larger images than you could with any of the previous stable diffusion models in fact as long as the resolutions that you choose to build something at come into about a million 48 576 pixels you should be pretty good to go this opens up a whole new range of different aspect ratios and resolutions that just simply weren't possible before a few examples here you can do vertical 9x16 portrait 4x5 square one by one photo four by three landscape 3x2 widescreen 16x9 and even cinematic 21 by nine on top of that this is the largest model this will be open sourced by anyone at this point and it also also has a refiner model so not only do you have this base model that's used to generate your stable diffusion images and you also have this refiner that sits on top of it and takes your images just a little bit further but how much further is this a competitor for mid-journey finally to test that I generated a bunch of images let's take a look at some of those now to get started I'm using the latest version of invoke AI which is 3.01 post 3. you can download this stable diffusion XL 1.0 directly into this and get it started really easily you can also do the same with automatic 1111 alright so what I wanted to test here is how sdxl handles pretty simple prompts so if we use all from this we can see that my prompt was just a photo of a German Shepherd writing In First Class airplane seat and you can see that that's pretty darn good this is much much better than you would have gotten out of one of the previous builds of stable diffusion and this is a little bit closer to what I'd expect out of something like mid Journey mid journey in the last few releases is has really gotten to a point where you don't need this long structured prompt in order to get a high quality result and so it's cool to see that stable diffusion XL at least at first glance appears to be headed that way as well so to break down what some of these other pieces of the prompt mean because some of this is new we obviously have our prompt at the top but now we've also got this positive style prompt now in here you could put pixel art you can put cinematic you can put photographic and this just drives the style of art so for this image I used photographic and you can see it's typically what you'd expect from a photograph it looks very high quality very photorealistic other than my dog kind of hanging through the bottom of the seat but we won't get into that now I also tried Pixel Art here's what happens when you do the exact same prompt but you just changed the style to pixel art also tried cinematic you can see now it has that big depth of field and it's more of a front-facing shot that kind of Epic cinematic style that you'd expect from that you can also do things like line art cartoons I love that this one has a little German flag as the caller for the German Shepherd so as you can see you can really drive the style of the art and this you used to have to do in the prompt by selecting different artists or adding artist names art styles but now you can just simply add that right here the other part that you're used to is the negative prompt so you can still add your negative prompt and then you can also add a negative style so if you want a photograph but you definitely don't want any sort of cartoonish features to it you could add cartoon anime that sort of thing to the negative style as you can see for the image generation we're using stable diffusion XL base 1.0 and here's the other piece that's going to be a little bit new if you haven't used this before you've got the refiner so once your image gets 70 percent of the way through it's gonna kick over and it's gonna refine the rest of the image with the refiner model you get a little bit higher quality from what I've seen it gets those fine details in there that perhaps were missing from the original image the refiner model is not good enough to take you from a pure static image like you would start with from a normal stable diffusion image all the way through to the final product it's only meant to take a mostly non-static image and then kind of getting it that last 20 or 30 percent the rest of the settings related to the refiner should be pretty straightforward if you've used other versions of stable diffusion you can select the steps which is how many iterations it's going to do on top of the original iterations that it did for the the base model you get the CFG scale so how much it's going to adhere to that original prompt the aesthetic score from what I can tell it's basically how beautiful it scores the image to be the higher the setting the more blurred the background the more bokeh the more sort of I don't know instagrammy and so play around with the Aesthetics score see what you like and then the refiner start this is the percentage of the way through that first image generation when the refiner actually kicks in one other really important thing to note here I have an RTX 3090 so I've got 24 gigabytes of vram problem I ran into to right off the bat with this is that unless I went in and tweaked my invoke AI settings what was happening is that the base model would render the first part of the image it would unload that out of memory it would load the refiner model render the rest of the image and then if I was doing two three four images in a row it would unload and reload and so on so you'll go into your settings for invoke Ai and you want to increase the number of models that it can store in memory now you can only do this if you have enough vram for both of these models to fit in memory so probably at least 16 gigabytes of vram kind of at the minimum otherwise you're going to get very slow image generation times now of course I didn't just generate pictures of German Shepherds although who could blame me they're super awesome here's another example where I just said KFC chicken burger dark background quality photo Fried Chicken Studio photo and smoke you can see the quality of the image that came back is really really good A lot of the images of this quality would have required sort of a third-party manually trained model in order to get back this quality same thing here I said hey okay make that chicken burger into a burrito another amazing shot of that I mean you could use these as restaurant stock photography following on the next series of images I did were cars I noticed it with stable diffusion at least in Prior versions you couldn't get very high quality photos of cars out of it unless you loaded a third-party model from Civic AI some sort of checkpoint that had been trained specifically on cars but in this case I'm pretty Blown Away by some of the results that came out of this and again these are very simple prompts for this one for example The Prompt was Tesla Model X plaid and it's a photograph and so to be able to get this quality of photo out of this with almost no effort it's kind of a big generational leap now of course what fun would stable diffusion be if you don't do some kind of silly random crazy things here's a guy that's turned into a cucumber here's a Lego race car you see the depth of field and such on this is really nice this has that aesthetic score cranked way up on the refined liner which gives you sort of that depth of field in that bokeh that you'd expect from those images here's something interesting this I did a wide photo and I put the resolution outside of the sort of 1 million pixels that I talked about before this one didn't fall within the kind of 16 by nine or one by one and you can see that it stretched the car in the middle similarly when I made the images Too Tall same thing happened and you'll know this because you've seen this in other stable diffusion models where you get repeats of the car or of the person multiple heads multiple hands kind of interesting objects abnormalies based on the data that's left in them but if I went back to sort of a one by one or some of the other resolutions again I started getting back that really high quality look that you'd hoped for at this point I was testing some of my mid-journey prompts so an army of robots and got back exactly kind of what I was hoping for and what I was looking for a photo of a dog at a cafe in Portugal this you really couldn't tell by looking at it that it wasn't a photo and I think that's pretty cool some of these are very high quality and you'll see some abnormalities every once in a while there's something going wrong in the mouth with the teeth or something it looks like sometimes the eyes aren't quite right but overall I would say this is much much better than the stuff that you would get out of stable diffusion with very simple prompts The Prompt for this one was a close-up of German Shepherd dog tropical palm trees in the background inside a coffee shop in Portugal I'm somewhat obsessed with robots and artificial intelligence as you can tell so this is a series I did on robots take a look at this one for example and it's just a futuristic robot and I think it did a really good job of coming up with some very appealing images the other interesting thing is stable diffusion XL can purportedly create accurate text and so I tested this out and I was trying to create something that said all your Tech Ai and you can see all your Tech most of these were are pretty decent from what I ended up being able to sell it also depends on the sampler that you use using Euler came back with slightly better results in some cases than using some of the other Samplers I tried this one is pretty eligible with all your Tech so was this one although I don't know what's going on at the top there using Euler I think also came back with a little bit better lighting and shadows you can see this photo of this Lamborghini comes back with really high quality realistic looking lighting conditions that I wasn't quite getting with some of the other Samplers that I was using and this leads me to something else I noticed one of the things that drives me crazy about most of these models is the quality of the eyes if you look into the eyes you can see something's just not quite right but then every once in a while when I cranked up the sampler steps and in this case I went to a hundred you can really see the quality of these eyes that looks just like my actual German Shepherd's eyes you can see her almost staring into your soul with this and I thought it was a really cool look now there are some abnormalities around the mouth that are unfortunate but I thought this image really got the eyes spot on and that's something that stable diffusion in general has a difficult time with at times you used to have to do things like the face fix filters in order to get some of those abnormalities to go away before but I think we're getting closer to that no longer being an issue now speaking of Samplers and steps I did a bunch of experiments and I tried different Samplers so this was Euler with 15 Steps for both the base model and the refiner and then I would crank it all the way up this was a hundred steps for each and if you look down here at the bottom of this bun you can see sort of it adds a lot more detail than you get at those lower steps you can see here it just looks kind of uniform and plain and I noticed the same thing with skin textures so when you would look at the pores in someone's face they were way too uniform if you had this too low but as you started to Crank It Up you'd get some more of those sort of defining features and yes the render times are going to be higher for this but I think the trade-off depending on what you're trying to make is much better but on the same token you can take it too far here's some images of somebody's face using a hundred steps in both the base model and the refiner and you can see it goes way overboard that's not the look you're going for this pulls it back a little bit I did about 50 50 here and sort of tried it with a few different settings until I got something that was a little bit more natural I think this was probably the closest somewhere in here Skin's one of those hard things because you can easily go too far and get you know kind of too many blemishes and wrinkles or you can go too far the other way like this I think is unrealistically smooth and it's sort of that Perfect Skin another example of that yes these images are beautiful but the skin isn't just quite right and those are some of my first impressions of stable diffusion sdxl 1.0 let me know in the comments below if you have any questions otherwise we'll check you next time see ya
Info
Channel: All Your Tech AI
Views: 3,780
Rating: undefined out of 5
Keywords: sdxl 1.0, sdxl automatic1111, sdxl install, sdxl 1.0 automatic1111, sdxl stable diffusion, sdxl 1.0 install, stable diffusion, stable diffusion xl 1.0, stable diffusion xl download, stable diffusion xl vs midjourney v5, stable diffusion xl tutorial, stable diffusion xl local, stable diffusion xl model, stable diffusion tutorial
Id: LEmBseLxEI0
Channel Id: undefined
Length: 11min 54sec (714 seconds)
Published: Mon Aug 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.