Real-Time Text to Image Generation With Stable Diffusion XL Turbo

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
landcape of a Japanese Garden in Autumn with a bridge over a koi [Music] pond all right so we're going to be checking out real time text to image generation which is really cool now I actually haven't been doing much AI videos on my channel much due to the fact that it actually does doesn't view well so I actually keep it to myself now I do enjoy playing with AI like Tech generation web UI or stable diffusion all this other stuff I I've just been having a lot of fun building it but this is such a cool thing that I actually have to show it off the most impressive thing about this is as I'm typing what I want it generates the image so yes real time image generation is really cool anyway let's jump into the desktop all right so here we have the website from stability Ai and they're the one that released the model they actually talk a little bit about this whole text to image gener eration model and where you could get it so we are actually going to be getting it from hugging face which I already clicked over to a link and these are the two models that you want depending on which I actually used the fp16 but you could use either one and it works uh the interface that we're going to be using is something called comfy UI and it's um similar to what you would get with automatic 1111 but it's more node based so I'll show you what I mean it's basically a breakout of this and you could actually perform different tasks so instead of saving an image you could actually preview an image instead which saves you time and there's a few other things that you can add to this or remove from this or you can even have this generate an image then output to a upscaler then output to an image so you could actually like set how you want the image generation to be so it's a little bit more advanced than automatic 1111 with their stable diffusion and this has one feature that that does not have which is auto que which allows the real time generation part so to install this we do need a couple of things which is Python and the driver drivers for your graphic card now I did run this with a rock M 5.6 before with the AMD graphic card that I had but it was so slow because I only have a AMD 580 and that's like really really old I did put in a 1070 in here which is still not as fast but it works with this setup so we're just going to be installing the Cuda version instead so this is basically how you would get everything working if you want so I'm going to jump into it now here we have our little terminal let me make this a little bit bigger and we are going to grab this comfy UI so I'm going to do get clone and grab that and it's very small here I'm just going to make a direcory for environment so I'm just going to call it pen EV whatever and in here I'm going to do Python 3-m venv py and it's going to enable this directory to be a python environment next we just do Source py bin SL activate now why I'm doing this is because we actually are downloading a lot of stuff through pip and you want to keep your environment enclosed so you don't break your system so now that I have the environment all set up all I have to do is just grab the version I need which is this one Cuda and you could use the nightly version if you want paste that in here and it's just going to run its thing it is a pretty big download for this Cuda driver I think it's like two maybe 2 gigs yeah it was roughly about 2.2 gigs just to download the Auda drivers and now that everything is downloaded we just have to switch directories back to our comfy UI what we get earlier and then in here we're going to do pip 3 install d r and then we're going to do requirements this will install the rest of the stuff that is required by comfy UI and then that's it we could just start up the package as soon as this is done all right now with everything finished I could just clear this screen and I'm going to do Python 3 and then I'm going to run main.py now because I don't have much RAM I'm going to run low V RAM and if you want to be able to host this on your network you could do listen and it'll actually open it for every IP address but for now I'm just going to run low V RAM and there we have it we have our little web UI over here so all we have to do is just navigate over to it 127.0.0.1 col 8188 and there we have our comfy UI UI now I do still need to download the safe tensor and move everything over so just give me a second all you have to do is just copy the models to their folder comfy UI models and go to checkpoint and then you can paste your models over here and it'll be able to detect it all you have to do is just hit refresh and you should be able to choose the ones that you have so we could use either 16 or regular we because I downloaded both and all you have to do is just hit Q prompt this will actually run everything in the background to run your first image generation and there we go we have our first image and it's already saved because it's actually set up to save the image every time you generate it you can make this bigger if you want you have your 512 x 512 width how many batches what the seed number would be there's a lot of things that you could set up here that you could set up on uh stable diffusion web UI so things are not too different but you can add new prompts to this so instead of saving the image and I want to preview an image instead I could do preview Image Grab that all you have to do is just double click on any empty spot and then that will open up and instead of saving the image you can now drag this over to preview image and you get rid of this if you want and then now if I run the same prompt again oh I got to get rid of this remove this and then Q prompt it's going to run it again and anytime you see a green bar that's the process that it's working on so right now it's still working on the image generation itself it's on 20 steps which is a lot for my 1070 right over here but it's still able to do it and then it'll actually process this information move it over to VA decode and then move that over to preview image which is really cool I really like how this whole thing kind of like works out so there you go it's green and then it previewed this image to use stable diffusion turbo we do need to change this environment quite a bit just to get it to work the way we want it to so what I'm going to be doing right now is actually setting up this desktop so I'm going to add a few things first thing what we need to do is get over here and do SD turbo Schuler and we're going to need that okay and then we're going to need K sampler select all right and then sampler custom right there all right so we're going to get rid of this K sampler and move everything over to this sampler because of all these other settings that are in here so basically you have the turbo Schuler which basically is your steps and we have that here we don't need it in this sampler anymore so that's why we're using a custom sampler now we're going to have to attach everything from here to here all right so now our it goes from our model to go to the input model and then the sigma would go from here to Sigma and you could see it's lit up compared to the other ones so that's where I'm going to connect it to you have your conditioning for your positive prompts which is over here and then you have your conditioning for your negative prompts which is over here now keep in mind this actually does not work on this model which is the SD XL turbo sampler it doesn't work so you don't actually really need this but you do have to have something connected now the lon we have to move this over to here and then our K sampler right over here we have to move this over to the sampler and again we didn't do our negative prompt because we actually moved that by acent to the positive prompt so let's do that negative prompt let's get rid of that the positive prompts got to move down here and then we got to move our model over to here now we could get rid of this and then our output we have to move over to here to our sample and then our image goes back over to our preview image now this looks like a huge spiderweb of stuff but it makes sense once you add everything up now with that being said I should be able to create this in a much faster way now this is still running on my 1070 so it's actually not as fast as I wanted to be and you can see the quality in difference because we are only using one step instead of 20 now what I'm going to do now that you see everything all set up I'm going to actually jump over to my desktop which is which has a 3080 instead and you'll see the difference in speed all right so here we have it set up to uh my desktop PC which has the 3080 and if I hit Q prompt it's almost instant and it's much better than my uh 1070 believe me it's it's way much better and one of the features that I was saying earlier that automatic 1111 doesn't have is this extra option right over here as soon as I hit that I auto queue and auto queuing allows me to do the real time image generation so after I enabled this I'm going to hit Q prompt and now it's just going to be running in the background you're going to see this Q size 1 0 1 0 it might pop up from time to time but now if I delete this it's going to automatically generate an image now I could actually type whatever I want in here so let's try cute dog oh okay with top hat in grass field running look at that that's instant image generation it's as I'm typing it it's generating now you can see the quality isn't as great but you can actually change the stepping and it gets better as more steps gets introduced but still this model is not perfect there's still some um issues with it like hands fingers it's not trained to the point where it's like perfect like all the other models but you can at least get a general idea of what you want and it runs really really fast now I'm going to try another image now let's say um landcape of a Japanese Garden in autum with a bridge over a koi pond look at that now I'm going to change that back to One Step but that was using three steps so if I was us checking out one steps that's how cool it is now if I want to say something else uh dystopian oh look at that future with spaceships cool neon lights that's cool look it just changes that right over and anime girl let's see the whole style changes look at that the anime girl doesn't look as good but you kind of could see where the hands doesn't feel in the waist kind of looks a little bad the face you can't really see that clear let's try running yeah that's even worse with the face but it it's instant it's so quick so I wouldn't really use this for models like people but you can actually do stuff like this now if I delete certain parts of this look how cool this is look it just changes instead of spaceships let's change this over to to cars look at that instant it's just a lot of fun playing around with this anyway that is it for the AI generation if you like more videos like this please let me know down in the comments below because I know I've done quite a few AI videos in the past and it doesn't do very well so that's why I haven't pushed out any more AI videos but yeah if you guys are really interested in this let me know down in the comments below and if you guys are new to this channel consider subscribing and also hitting that Bell notification icon so you know when when the next video is going to be out and I say my nerd cave hack till it hurts
Info
Channel: Novaspirit Tech
Views: 7,203
Rating: undefined out of 5
Keywords: novaspirit, tech, stable diffusion xl, stable diffusion xl turbo, sdxl turbo comfyui, comfyui, sdxl turbo, stable diffusion, ai, automatic1111, comfyui tutorial, stable diffusion comfyui, sdxl, real-time text to image, real-time
Id: SsvaLtS2JIo
Channel Id: undefined
Length: 12min 33sec (753 seconds)
Published: Thu Dec 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.