Stable Cascade: The Open Source Champion From Stability AI

Video Statistics and Information

Captions Word Cloud
Reddit Comments
the announcement that stable diffusion 3 is coming out soon kind of overshadowed another really important announcement and that's stable Cascade so let's talk about why it's such a big deal and how you can actually install it and use it on your own PC Today stable Cascade is brought to you by the same people that brought you stable diffusion folks over at stability Ai and what makes this interesting is it's still a text to image model but it's built upon something called the woron architecture the reason you should care about that is because stable Cascade is exceptionally easy to train and fine-tune on consumer Hardware thanks to a three-stage approach so let's talk about that a little bit you can see some examples here we're going to dive in we're going to run our own examples through stable Cascade but one of the big things is the adherence to The Prompt so how well it actually follows along with the details inside of a prompt if you want an object over here on the right and an object over here on the left does it actually listen to you how well does it do text so so you can see there's coherent text in all of these example images and then finally how aesthetically pleasing are the images that it produces first let's talk about the reason it's called stable Cascade you can see here that you're broken down into three stages a stage a a Stage B and a stage C there's also groupings in those stages you've got the decoding layer which is stage A and B and then you've got this generator layer which is stage C now traditionally when you fine-tune or train a stable diffusion model you have to do so across the entire data set this is really timec consuming it takes a lot of hardware and compute to do that that's why it takes several hours sometimes to train just a simple Laura file the difference here with stable Cascade is that that training and fine-tuning is done just at stage C there are also some under the hood changes that make this a much more powerful model that'll run on lower-end hardware and for that we can jump over here you can see the model is built upon the worst in architecture and its main difference to other models like stable diffusion is that it's working at a much smaller latent space now what the heck does that even mean well I'm glad you asked in artificial intelligence latent space refers to a mathematical space which Maps what a neural network has learned from training images once it's been trained it understands all images of trees exist in a specific area all images of birds in another so it's really just a compression of that space now going back to over here why is that important the small smaller the latent space the faster you can run inference and the cheaper the training becomes so how small is the actual latent space for stable Cascade well it uses a compression factor of eight in stable diffusion so 1,24 by 1024 image is represented and encoded at 128x 128 with stable Cascade on the other hand it achieves a compression factor of 42 meaning it's possible to encode 1,24 image down to 24 4X 24 if you watched my video on how stable diffusion works you'll know that that compressed image space that latent space is where it actually does the image steering with the text so when you provide a text prompt the image is steered based on that latent space so by having it compressed so much down to that 24x 24 you can get a 16 times cost reduction over stable to Fusion 1.5 what this all means for you and I is that we can train these models on much lower-end Hardware we can train a much faster on the same gpus so for stability AI their massive GPU clusters they might be able to train a brand new stable Cascade model in a few hours or a few days instead of weeks or months new architecture still supports all the things we know and love Laura models for training and fine-tuning based on Styles and Aesthetics control net IP adapter LCM all of these things are still possible with this new model and from a visual evaluation point of view this is how the images actually look they actually did some really interesting things here they compared stable Cascade at 30 inference steps this is the number of times an image is iterated over within stable diffusion and it's compared against playground V2 at 50 inference steps stable diffusion XL at 50 inference steps and sdxl turbo at one inference step and the results across the top here are prompt alignment so it's actually stable Cascade you can see here on the right much better at prompt adherence so actually putting images and details and images at the right place as you specified it in your prompt the other piece is the aesthetic quality this is down below and you can see the aesthetic quality against all of these is much higher except for playground V2 which it's pretty much on par with why does this matter why does prompt alignment and aesthetic quality matter well it's how good the images look but also how precise they are this is just a fun tool cool a fun gimmick if you aren't able to precisely make an image exactly how you want it you've got to be able to say hey I want a plant over here I want a light over here I want my hands up in the air it's got to do what you ask it to and so we're slowly getting there we've seen better text generation from recent models and we're going to see even more prompt adherance I think Dolly 3 is probably the reigning champ and it makes sense open Ai and chat GPT are really good at understanding language and so they've baked that into their model stability AI in the open source world really getting into that as well the other really cool thing to mention here is this runs on various spec hardware and I think you're going to see more of that from stability AI going forward so stage C comes with a 1 billion and 3.6 billion parameter version so two different sizes and they obviously highly recommend the larger one because it the most work went into fine-tuning that there are two versions for Stage B as well that amount to 700 million and 1.5 billion parameters and lastly stage a contains 20 million parameters and it's fixed due to its really small size so you can run this on low vram or high vram systems and you can use kind of pick and choose the models and the different settings that you want to use in order to do so and I think that's really cool because it sort of democratizes the open source nature of the world going forward and that's something that's awesome installing this is relatively straightforward but it's not quite as easy as just downloading the model and dropping it into Focus or automatic 1111 if you've already got one of those installed and you've followed one of my tutorials you need to install gradio accelerate and then you need to install the actual diffusion models from woron V3 once you've done that you can just go here and you can fire up the gradio app this needs a special gradio app that's baked in with all the details you need in order to run stable Cascade it's going to get easier going forward but for now that's what you have to do and if that all feels a little too daunting you you can always sign up or head over to my patreon page over at patreon I have an how-to on installing stable Cascade with an auto installer it's a oneclick installer you press a button and this batch file is going to download and install python get every single thing that you need it's even going to install gradio and there's a launcher another oneclick button you press it and you can launch the app really super easy and simple so don't worry I've got your back that links down below all you have to do to run that installer is drop it into a directory in this case stable Cascade double click on it it's going to launch a window and you can see it's taking care of everything for us it's downloading everything that we need running all the packages all you have to do is sit back relax grab a cup of tea have some fun when that's done you're going to have this brand new stable Cascade folder when you go into that simply drop in my stable Cascade launcher click on that and that's going to launch the terminal window that you need in order to run this new custom version of gradio that's going to fire up and then you're going to have a web browser window pop up that's going to have stable Cascade up and running in just a few seconds and I'll give you a tip for gradio when it starts up you can do question mark uncore uncore theme equals dark it's going to take it into dark mode a little bit easier on the eyes and you can see down at the bottom food photography of a delicious steak futuristic Soldier these are example prompts that I've added to this system so you can get started with just clicking on one and hitting the Run button just like every other version of stable us usion you've had you're going to get this sort of static funny looking image and it should run fairly quickly especially if you have decent high-end Hardware now one thing to note I am using up about 20 gab of vram with this early preview release of stable Cascade so hopefully you've got a little bit of beefy system you can see the results come back in just a few seconds and it's really high quality it's really aesthetically pleasing what we're going to do is some side by-side comparisons between stable diffusion XL and stable casc so we can see Apples to Apples how they Stack Up on the right hand side you can see I'm using pixel doo. this is my own personal project you can go over and you can join you can create an account and you can start using different diffusion models I've got stable diffusion XL Juggernaut anime XL even stable diffusion XL lightning you can also do some really cool things with AI image upscaling which I just released out of beta a couple days ago as you can see it takes your original images here on the left from stable diffusion or any other source and it adds a whole bunch of new details to it making them photorealistic and of course there are a bunch of Open Source language models you can play around with including mistol mixol llama 270b and even Google Gemma the brand new one that was just dropped a few days ago but today again we're going to focus on just doing a side by-side comparison of images I've got a few examples down here that I include and when you click on them let's take this Dragon for example you can see that it prefills the prompt and the negative prompt so we're going to go ahead and just copy these over to gradio and use the exact same settings we're not going to change anything else now if you're running stable Cascade on your own hardware and stable diffusion XL stable Cascade should come back faster in this case it's not going to be faster because well pixel Dojo is using Nvidia a100 gpus versus my 3090 here on my local PC and when we're done let's take a side by side look at the results and I would say in this case these two came out pretty similar you can see the detail of the gems in both of them both dragons don't like the details of the eyes in either of these I think they could be a little bit more esthetically pleasing but very similar result from the two so we're really looking for two things here how well it adheres to the prompt so every single detail that's presented in the prompt and then just objectively how does it look how aesthetically pleasing is one image over the other so if there's the next one we're going to say a beautiful lady freckles big smile Ruby eyes short hair dark makeup hyper detailed photography soft light let's generate for both okay so interestingly neither system really adheres to The Prompt completely it is a shoulder up picture it's a beautiful woman with freckles it did give them both red hair although the one on the left I would say is Ruby hair more than red neither of them have Ruby colored eyes they both look like they're kind of greenish grayish colored eyes and so it seems to apply that Ruby color to the hair rather than the eyes I will say from an aesthetic point of view the one from stable Cascade does look generally better and more realistic next up let's test that prompt adherance a little bit more Moody aesthetic beautiful cozy cramped bedroom with floor to ceiling glass windows overlooking a cyberpunk city at night renal rain downpour let's see what it comes up for both of those so both images are kind of that dark gloomy look that it called for you've got Flor to ceiling Windows it looks like it's overlooking the nighttime of a cyberpunk city there's that torrential down por it looks a little bit weird over here on the right hand side is that water in the middle of the floor looks like it might be there's also some weird abnormalities going on with some of the details in the room but it does look like it's more cramped on the right hand side perhaps and the left looks more aesthetically pleasing it looks like it's more modern and open sort of a very clean look I'll say both did a good job of following the prompt I would say on the left you get a bit more aesthetically pleasing image out of it this next one medium shot adorable creature with big reflective eyes Moody lighting full body portrait real picture and here we go both of these guys are pretty freaking adorable now I will say the one on the left looks much higher quality the details the one on the right sort of jagged rough edges it's kind of blurry in the middle little bit off the one on the left super high quality very aesthetically pleasing so I think again the aesthetic score this is why stable Cascade is scoring higher on the Aesthetics you can see see it in the results all right for this one sunlight filters through a dense rainforest canopy Illuminating a tiny robot Rusty and weathered it's eyes still glow this one's slightly more of a tossup you can see both of them look really high quality I like the depth of field the lighting everything else looks great in both images I will say the one thing this guy on the right looks a little bit more haphazardly put together maybe that's what you're going for maybe it isn't but the one on the left is a little bit more clean and Polished in the way that all the shapes sort of align together with all the different body parts but I'd say both quality images out of these two models for this next one we're going to actually make this a series of images using the same prompt but building on top of it so I'm going to say a group of cats taking a selfie then we're going to see when this starts to kind of fall apart all right stable diffusion XL I don't know what you're doing you're already kind of letting me down on this one you've got this camera kind of floating in the air I guess that's the selfie camera otherwise both of them are really similar results you've got a group of of cats see five cats in both images they're all kind of looking at the camera getting ready kind of Clump together for a selfie let say overall good result if it weren't for this weird camera in the middle so now we're going to add on to that we're going to say a group of cats taking a selfie holding up a sign that says no dogs this is going to test not only the Aesthetics of the image but also how well it adheres to the prompts so they got to be holding the sign up and then does it actually have coherent text okay super interesting I I don't know why it went straight to kind of a cartoon aesthetic over there on the right hand side it says no do does can't really get the text coherent on the left man look at that text that looks like it's a font face that was just typed on here that looks like something you do Straight Out of Photoshop is also actually holding the sign with these weird cat hands these like furry hands at the bottom but I got to say it sort of nailed it we can push it a little further now we're going to add the cats are inside an Ultra Modern Cafe with a group of people standing behind them on the right again it went with that kind of cartoonish feel no do no it's still not getting the text right it's got a bunch of arms randomly up in the air with people supposedly with a camera maybe or something I don't know really odd one sort of nailed it on the left though you've got cats in the foreground taking the selfie now you've got a person holding a sign in the back that says no dogs still coherent text and it looks like they're inside of as some sort of modern building I guess we got to keep pushing this further until we break stable Cascade on top of all that we're going to add the cat on the left is drinking a cup of coffee all right and I think that's where we've reached the limit of these capabilities you can still see no dogs on the sign on the left although at the top it's now losing its coherence no dogs are inside but it's sort of broken English the people looks like this guy has a fist going into his face over here on the left this guy on the right looks like he might be growing cat ears and he has like 15 fingers on his hand something's going on here and on the right it does actually have a cup of coffee sitting in front of the cat on the left so it did get that they're inside of a cafe but there's no people the cats are sort of dressed as people and the text is still kind of wonky I think we push stable Cascade about as far as you can but you can see overall has a very nice adherence to the prompt and that's exactly what we need going forward it also produces some really high quality images and I think that this underlying stable Cascade is what stable diffusion 3 is actually being trained upon I think what they're doing is they're fine-tuning the model and they're making it even better by adding even more steerability to the text layer that sort of part C of the pipeline if you will that remains to be seen I'm looking for to stable diffusion 3 coming out and you know I'm going to do a video as soon as it drops don't forget to hit those links in the description below if you want to check this out on Pixel Dojo AI or or you want to install with the oneclick installer otherwise hit that like And subscribe button it really helps me out as always I'm Brian love it and remember all your Tech are belong to us the in theak down Crown from Basics to complex never let you down all your tax a i earning the renown
Channel: All Your Tech AI
Views: 1,095
Rating: undefined out of 5
Keywords: stable cascade, stability ai, stable cascade ai, ai art generator, ai tools, ai news, stable cascade install, stable diffusion, stable cascade ai model, ai image generation, how to run stable cascade pc, stable cascade tutorial
Id: 26GC7JuavIo
Channel Id: undefined
Length: 17min 22sec (1042 seconds)
Published: Thu Feb 29 2024
Related Videos
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.