Generating Realistic AI Images with Stable Diffusion

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what is going on guys welcome back and today's video I'm going to show you how you can create your own AI generated images based on a simple text prompt using stable diffusion so let us get right into it [Music] alright so in this video today we're not going to write any code we're not going to implement our own Solutions we're not going to train our own AI models we're going to make use of an existing project called stable diffusion we're going to use an existing model and we're going to generate AI images based on text prompts and maybe based on input images those are the two main functionalities of stable diffusion we can go to the GitHub repository you will find all the links that you need in the description down below and when we scroll down here you can see some sample images here that are the result of a simple text to image generation where you provide a text prompt for example I don't know what the prompt for those images was but maybe something like two cats on a uh what is this a radio or something like that and then it generates images like those or you can also provide an input image as a guideline together with a text prompt like down below so this is the input image a simple sketch of a landscape and then you provide here a fantasy landscape and so on as a description as a prompt and those are the two images here that result uh that are the result of this combination so this is what we can do with stable diffusion and this video today is purely a setup guide uh an installation guide you could say because I'm not going to explain any of the inner workings of stable diffusion how it works or any of that I'm not going to go into the theory I'm just going to show you how to get this running on your system provided that you have the necessary Hardware so this is just a practical guide on how to get this on your system how you can use this to generate AI images and I want to show you some impressive results here that is the main purpose of this video so the first thing we need to do is we need to go to the official stable diffusion repository Compass slash stable diffusion as I already mentioned the links are going to be in the description down below and we want to download all the files so we can just click here on code we can just download as a zip file you can of course also clone it uh using gits I'm going to save all this here on my desktop so this is going to take some time while this is downloading what we're going to do here is we're going to use this main repository to do the basic installation but later on at least uh this is what I had to do in my system to get better results later on we're going to use this repository here which is an optimized repository from I don't know what the name is exactly but I think bazu Jindal or something like that this guy has the same um repository here so basically a fork from the official repository but it has this specific folder optimized SD which basically means that we don't need as much vram in our GPU so you will probably need an Nvidia GPU um and the more vram you have the more capable you go or the more uh the more features you will be able to use the higher the resolution that you can use but this optimized version is just uh yeah runs on Lower GPU vram as it already says here in the description so we're going to install this one here from the official repository and then we're going to swap the uh what was it the scripts folder we're going to exchange it for the optimized SD folder and then we basically have a version that runs more efficiently on uh less vram so that's that now I don't know how much time this will take to download so I'm just going to skip to the part where we already have this on our system all right so the two downloads are now finished I also downloaded the second repository here so that we already have it on our system and now we can just open the directory right click I'm going to use 7-Zip to extract all of this into a directory and now what we need to do is we just need to follow the installation instructions of the main repository we're going to go here to the requirements section and here it says we need to create a a conda environment so an anaconda environment if you don't have an account installed you're going to have to install it it's quite simple but I also have a tutorial on this channel where I explain how to do it I think it's part of my data science Series so it's a very old video but you can check it out if you need some help with installing Anaconda so I'm just going to open up my command line here I'm going to navigate to the desktop I'm going to navigate to stable diffusion and in here I'm going to now say conda which is an anaconda command and for environment create Dash F environment dot yaml which is the file that defines the environment so we're just going to run this it's probably going to have to do some installations here so this is going to take some time I guess after this we have to activate the environment ldm which is the name that this environment has based on the ammo file and every time we want to do something with stable diffusion we will need to have this environment activated because it's going to use the package it's going to use this environment here as basis all right so once all of these dependencies are installed we can run conda activate ldm and then we are inside of the environment that we need to be in so what we're now going to do is we're going to add the optimized SD folder so we're going to open up the second zip file we're going to navigate to or we're going to open up our directory here we're going to go into this one and we're going to copy or we're going to extract here the optimized SD folder I'm going to close this now I'm going to close this now and essentially what we now need is the model that is actually going to do the generation so we're going to go back into the initial repository here we're going to scroll down to this action section here stable diffusion V1 and we want to click here on this link which is the weights are available via the compass organization at hugging face so we click on this link to navigate to hugging face and here we have a bunch of models to choose from the one that we're going to download here is stable diffusion V1 for original this is the one that is compatible with what we're trying to do we're going to click on this one here and we want to download this file here so sd-v1-4 dot ckpt I'm going to download this and we're going to place this into stable diffusion uh what was it I think models was it models yes ldm and then into a new directory called stable Dash diffusion Dash V1 inside of this directory here we're going to save it but we're going to save it as model dot CK PT so we're going to just save it and this will take some time I think it has uh what was it four gigabytes or something of size so this is going to take some time so we're going to skip that part here as well all right so now the download has finished which means that we're done with the setup process we're done with the installation process now it's all about figuring out how to actually use stable diffusion how to actually generate the images and this is explained in quite simple terms here in the optimized repository we can just scroll down here um and we can see exactly what commands to use to generate from image to image or from text to image and there is also a graphical user interface that we're going to take a look at here in a second but let's start with a command line usage we're going to just go into the command line again we're going to navigate to the desktop or actually we don't need to navigate to the desktop because we are already in the correct directory here and what we want to run is essentially the command that we have here python or python3 depending on your system python optimized SD and then either image to image or text to image the respective script then a prompt and then some settings here and there are many settings that we can choose here and this is actually why I prefer to have the graphical user interface usually I'm a command line guy and I prefer to have commands instead of buttons and and Sliders and stuff like that but in this case it gives you just a better overview to have a graphical user interface we're going to take a look at this here in a second but let's just copy this command here and let's just paste it here and let's change the prompt from cyberpunk style Tesla to something else let's change it to uh I don't know purple cat playing tennis against Super Mario and let's go ahead and say that we want to have smaller images here so that it's a little bit faster let's do one iteration in just one simple sample because I want to move on to the graphical user interface here let's just run this and see if this works and if it works we should see some progress bars here in a second loading model found the model and it seems to actually work now depending on your GPU depending on your vram depending on your system as a whole this is going to be faster or slower obviously um I think usually it's uh faster but now maybe because it's the first time I'm running a command it takes some uh some more time but you're going to see here that it doesn't take too long to generate the images or actually I think that the problem is that I'm recording so I'm not sure if my video is going to lag I think the recording is just fine but it's going to uh it's massively slowing down this process so usually it's way faster usually takes like for one image of that size it usually takes I don't know uh five to ten seconds on my system but now since I'm recording it takes longer but still we should see a result the result is then going to be stored here in the outputs directory so we're going to be able to see it here in outputs and then uh text to image samples once it's done and then for each prompt that you have you're going to have a separate directory so if I go here into text to image samples you're going to see here purple cat playing tennis against Super Mario uh okay this is garbage and this didn't work well but uh instead of just showing you here now how to do it better in the in the command line I'm going to show you the graphical user interface because there we can also use some more samples and we're going to get some good results so this is not what you get usually this is just a bad example I'm going to show you that we're going to end up with some uh pretty impressive results here but in order to run the graphical user interface we need to install a package we need to install the package called gradio so pip install radio is what we need here and with this we can then run the text to image in the image to image um as a graphical user interface so there's separate scripts for that this is going to run on localhost but once we have this installed what we can do I think it's also listed here we can call then just python optimizes the image to image gradual or text to image gradio which is going to allow us to do that so we're going to just copy this command here to open the text to image version and this is going to run as I said on localhost so once this is done it's going to provide us with a port number I think it's uh 7860. as far as I remember so let's see how long this takes there you go so this is the IP address and this is the graphical user interface so what we do here is we provide a simple prompt so for example let's go ahead again with purple cat playing tennis against Super Mario and we're going to now say uh that we want this this resolution here and here we have different Samplers this is quite interesting first of all I'm going to enable the turbo option to just make everything faster and depending on the sampler you use you get a different style of image now I'm not entirely sure what style the different Samplers are giving you I cannot really formulate it in sentences here but you will notice for example a difference between using plms and Euler on the same prompts so if you generate 100 images using this one you're going to get a different style than using this one so you can play around with that but we're going to now just say that we want to have four images and one iteration for batch size four basically means that we're going to get four Images at once and iterations means that we're going to do this n times so if I say two iterations batch size four it's going to give me eight images first four and then the second four uh if I say eight one it's going to give me eight images at once I can also say one eight to get one by one the images of course higher batch size takes longer until you get a result and iterations you always get the results in between so let's just run this here and see what happens if I submit this in the command line you should be able to see the process maybe this is going to be faster now due to the turbo uh check box that I that I checked but I'm going to speed this up so yeah all right so now it's done and we can see that the results are actually quite funny so we don't really get a purple cat in this image maybe but this is actually Mario as a purple cat it's not a purple cat playing tennis against Mario but you can see that at least to some degree it produces what we're looking for now this might not be the best example I'm going to show you some examples that I already did with this setup so not with another model not with other resources with this exact setup I have some examples of different prompts and the resulting images that are quite impressive I'm going to show you those but I want to generate something interesting here so maybe let's go with a different prompt let's say um something like a fantasy portal or maybe let's say a blue fantasy portal leading [Music] to a mysterious world full of clouds and I don't know demons Maybe so this is quite specific and it should be able to deliver that so we're going to run this one more time I'm then going to show you also the image to image version how to provide a sample image how to generate a new image based on an input image and then I'm going to also show you the examples that I already have all right so here you can see the results this is actually a game card this one is quite cool so here we have actually what I was asking for at least to some degree we see a blue fantasy portal leading to a mysterious world full of clouds I don't see any demons here but yeah uh here we have a blue portal here we have demons maybe yeah so that's actually quite impressive already think about this this is an AI it doesn't understand any of these words it doesn't know what blue is it doesn't know what fantasy is it doesn't know what portal is it just takes this text input and generates those images which is quite impressive if you think about the fact that I could write anything now we saw a purple cat might not be a purple cat playing tennis against Mario seems to be more difficult than this but this is still very impressive so let's go ahead and provide a sample image so let's um use the other one let's use the image to image version uh and here we're going to just write a simple guide or or not write a simple guide we're going to draw a simple guideline so I'm going to run this here I'm then going to open up paint and I'm going to say that I want to have 512 times 512 pixels as an input here and I'm going to just provide some simple stuff I don't know maybe we're going to have some some grass here we're going to then maybe have a blue sky and then maybe some mountains like this here uh and maybe I want to say that there's some lava so maybe I want to say here this is actually a volcano I'm going to just fill this here with orange and maybe I want to have some some mountains here that don't have any lava there you go so one prompt that I could use here is maybe Austrian Alps with volcano so the Austrian Alps don't have volcanoes but I could just say that this should be uh maybe I can add a cow or something just some basic animal here I hope this is not too confusing for the model shouldn't be let's just close this here let's just add the colors maybe some white circles or I don't know how to draw a call a cow but let's just do it like this so very abstract not a really beautiful drawing and chances are this is not going to turn into something good so oftentimes what it does is it just takes this image and produces a very similar version of it not really beautiful but we can try and if this doesn't work out so well I have some examples of something that did work out uh kind of interesting in the past already so let's just say input PNG here and let's go let's open this up in the browser here I can now drop the image so I can just go into this directory I can load the input PNG and I can say uh Austrian Alps with cow and volcano for maybe active volcano in the background something like that um and then maybe I can say here batch size 4 again iterations one and we can just go with turbo again submit and let's see what happens I think this should is this faster or slower than text to image I'm not sure let's see decoding the image that's the first part that's not two doesn't doesn't seem to take too long but still I'm going to speed up the process here all right so we have some results this is actually kind of cool but I don't see a volcano so we see some mountains it took what I gave it the basic structure a cow some grass and mountains it interpreted this as a tree instead of uh a volcano this one seems way more accurate so we have a cow we have some grass we have some mountains and we have something that looks like fire um here again I don't see fire here again I don't see fire but yeah this is quite impressive still you can see the basic pattern was provided with this simple abstract graphic and then it turned this into something that I mean this one looks like a pretty pretty good cow I would say and this looks like somewhat of a volcano slash fiery mountain or something like that but this is how you do that you can play around with that you can generate some images you can generate hundreds of images you can play around with different prompts with different settings here this is actually quite fun and once you have this running on your system I'm sure you're going to spend hours just playing around with it all right so as you can imagine I already played around with this tool quite a bit and I have some examples that I want to show to you guys some of them are quite impressive so that you get an idea here what this model is capable of the first one that I have here is Alexander the Great sunset this is the prompt and the result is this so basically some images of something that looks at least kind of like Alexander the Great maybe without a proper hand uh but that's quite interesting then we have again some Alps and some cows the reason I included this one is I want to show you that when you look at this here in the first moment this might seem like a solid image but then you zoom into it and you see one cow merging with another cow then they don't have faces they don't have proper heads we have some random pixels that should be cows or maybe houses or something like that but this is when you look at the details that it is not as good as when you look at it initially what else do we have uh Barcelona beach with lots of people same thing uh looks like a solid image then you zoom in and people don't have proper bodies or heads um then we have Barcelona Street Sunset we get some good images but when you look at the details I mean this one's actually quite solid uh if you ignore that this guy doesn't have a head but the buildings look fine I guess um but this one here for example looks like a part of a church that shouldn't be its own building it's it's the top of a church and not just a simple Tower uh what else do we have BMW with LED light this one is quite solid I think this one is also quite solid uh this one yeah not so interesting uh and then the most interesting one I think is fruit salad because this looks actually like some solid fruit salad so this one I don't know maybe maybe those points shouldn't be here but other than that this looks like a solid food fruit salad uh I wouldn't necessarily think that those are AI generated when I see them so this is quite good then I also played around with liminal place and surreal place so this is quite interesting because it catches the feeling of what is liminal what is uh surreal you can see that these images look a certain way maybe I can zoom in they look a certain way they're all AI generated so this is quite interesting I think here as well this one is also quite good yeah oh this is my favorite actually so this one also looks like the the thing that I this is something I think about when I think about liminal uh places then surreal place also quite impressive so at least this one here this looks like art if a human draws this or or visualizes this I think this would be classified as art yeah so this is quite impressive um then I also have this is also quite interesting to see I have futuristic city with flying cars the reason this one is interesting is first of all the results are quite good as you can see here but the one thing I want to show you here in particular is that the training set of these images um you can see what kind of images were included in training this model because when you go to where was it this one here you can clearly see that those are the typical stock photo or copyright Watermark uh yeah watermarks basically where you have a stock photo that you have to pay for but you can see it with uh with these things here and it thinks that those are part of the image so it included it here this is quite interesting to see which means that the model was actually also trained on stock photos that were not paid for uh but yeah the results are quite impressive here as well uh what else do we have we have portal to Fantasy World yeah we get pretty similar things here and then finally what I wanted to show you here also for the text to image is Stairway to Heaven those two are not too impressive but this one actually looks again like art um the only interesting example that I have here for image to text is the following I have this template here very basic drawing nothing too fancy just some orange Skies some grass some some path here and a basic Tower with an eye and some fire on on top and I think the prompt was what was it fantasy world with dark Mage Tower and actually I think the results are quite impressive so let me just open this up here again let's go into the directory again uh wrong directory there you go you can see that this actually is especially this one here looks not too bad when you consider the input image so when you compare the two this is quite impressive that it takes this image in the prompt and turns it into that um or into this this is also quite interesting I think so yeah those are just some examples you can play around with this for hours for days maybe even uh let me know what your best prompt is in the comment section down below and this is how you install how you set up and how you use stable diffusion so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye foreign [Music]
Info
Channel: NeuralNine
Views: 10,769
Rating: undefined out of 5
Keywords: stable diffusion, image generation, AI image generation, ai stable diffusion, stable diffusion image generation, ai, artificial intelligence, machine learning, generative AI, setup guide, tutorial, stable diffusion setup, stable diffusion tutorial
Id: H6mmNxynlZw
Channel Id: undefined
Length: 26min 11sec (1571 seconds)
Published: Tue Feb 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.