Stable Diffusion Crash Course for Beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
learn how to use stable diffusion to create art and images in this full course you will learn how to train your own model how to use control net and how to use stable diffusion's API endpoint this course focuses on teaching you how to use stable diffusion as a tool instead of going into the technical details it's a perfect starting place for beginners Lin Zhang developed this course she is a software engineer at Salesforce and a free code Camp team member let's get started hey everyone I'm Lane I'm a software engineer and a hobbyist game developer and today I'm bringing you a new Hands-On coding tutorial in today's video we're gonna generate art using an AI tool called stable diffusion if you look up the definition of stable diffusion it is a deep learning text to image model released in 2022 based on diffusion techniques in today's video we're gonna focus on teaching you how to use stable diffusion as a tool instead of going into the technical details to understand some big terms like variational autoencoders or embeddings or diffusion techniques you will have to have some machine learning background and also do some research on your own there is some Hardware requirement for this course you will need to have access to some form of GPU either local or AWS or some other Cloud hosted ones to try out the course material because it will be hosting our own instance of stable diffusion it's very unfortunate uh this will not work for the free GPU environment that Google collab provides because Google collab bans the multiplication from running in their notebooks don't worry if you really don't have access to any GPU power it is still possible for you to try out some web hosted stable division instances I will include an extra part at the end of this video to show you how you can access these cloud-hosted environments so that said let's take a look at the topics we're going to cover today the first one will be how to use stable diffusion how to set it up locally the second how to train your own model for a specific character or art style we call these Laura models Third how to use control net a popular stable diffusion plugin and last but not least how to use stable diffusion's API endpoint at the end of the course you should have no problem generating images that look like these pretty impressive huh all right I know you're excited but before we get started uh disclaimer we respect the work of artists and acknowledge that AI generated art using stable diffusion is a tool that could be used to enhance creativity but it does not replace the value of human creativity all right and now we are ready to get started let's start by installing stable diffusion on your local we're gonna go to the skithub repository and install them based on the instructions I'm gonna be installing on a Linux machine the installation process may take a while so be patient all right and now everything has been installed let's open the directory that we just installed and inspect the contents we can see that there is a subdirectory for the models the instructions under model stable diffusion says to put stable diffusion checkpoint models here so let's go and download some models the website that we are going to be using is Civic AI Civic AI is a model hosting site and we can see there are lots of models uploaded by different users one of the models that I like to use is called counterfeit it generates anime-like images when we open the model page we can see some sample images generated by the model and we take a look at the files these are the uploaded models it looks like the maintainer of this model regularly updates the versions so we can go to their links to see the updated versions we see a bunch of files adding dot safe tensors and these will be the models that we're going to use we also see a DOT va.pt this is a variational autoencoder model that will make our images look better more saturated and clearer we're gonna download models and put the checkpoint models in model slash stable diffusion and the vaee model slash vaes if the vae folder doesn't exist just create a VA folder once we have downloaded our checkpoint and vae models we're almost ready to launch the web UI we can customize some settings in web UI user.shell when you open this file the export command line arcs should be an empty string for my command line arguments I have configured to share this means that when I launch the web UI it says localhost it will also expose a public accessible URL this way my friends can access my locally hosted web UI via this public URL here's a part that I didn't capture in the video for the web UI to use the VA by default you have to set the VA path in web UI user.shell here is my web UI user.shop it has lots of customizations now it's finally time to start the web UI using web ui.shell if you're a Windows user you will run web ui.bat starting the web UI is going to take some time so I will speed up the video from the log lines you can see that our vae has been loaded and we have both a localhost URL and a public URL let's try out the public URL so this is what the web user interface looks like we can see a big text box here for entering prompts to generate an image all right let's start typing something I'm gonna ask for a girl with a short brow hair and green eyes and with a simple background let's hit the big generate button and it's going to take a few minutes I'm gonna speed up the video for you alright it looks like we thought our girl with a short brow hair and a green eyes on a simple background all right so our tax prompt in the form of a long strand worked it is also possible to use keywords or text we're gonna look at this website that has some reference tags for example you can see like girl long hair short hair skirt most of the checkpoint models we're using for stable diffusion are indeed trained on these tags so the models will have no problem parsing those keywords let's rewrite our prompt using those keywords foreign our prompt we're gonna look at some of the parameters you can see when you hover over the parameters there's some explanation text we're going to increase the batch size so that we get multiple images from one generation and the turn on restore phase you can learn about these parameters by hovering over them and reading about them yourself and indeed we got some girls with brown hair green eyes round glasses on a very simple background I actually want stable diffusion to help generate some images of Lydia our protagonist from learntical RPG who has brown hair green eyes round glasses and also braids her hair between the back of her head so we're gonna add braided hair to see if the AI will pick it up if you're like me you might notice that the background is a little bit too green and we can adjust that we just need to put in a negative prompt for our green background and we will see different color backgrounds the good news is that the backgrounds are no longer green however the braided hair isn't exactly in the direction that we want it to be so I guess we should adjust that in the prompt I feel like it's also a good time to experiment with a different sampling method different Samplers can produce a pretty different art styles and you can see some comparison images if you do a Google search I will also try to link to some in my article looks like the first and second picture do look like Lydia's hairstyle all right we're just gonna keep playing with it and add some different prompts all right so we asked the AI to show our hands uh however the hands are pretty deformed and here we're gonna try to fix them by using embeddings we again go to counterfeit this model page and we see easy negative this is one of the textual inversion embeddings that we can use so the first row is with easy negative and the second row is without looks like easy negative does enhance the image quality and make better hands all right we're gonna grab those easy negative dot safe tensors and put it in our embeddings directory all right going back to our web UI we're gonna add easy negative to the negative prompts click on this button that shows a portrait and under textual inversion we have easy Necto clicking on it will bring it into the negative prompt all right let's do some more Generations so what do you think do you think the images look better with the easy negative I think so it looks like we got the basics of text to image now it's time to try out image to image let's save this image for our use go to the image to image Tab and upload our image here suppose we want a similar post but instead of a brown hair girl we want the pink hair girl image to image also have batch size and the restore faces and all the other settings we've seen in text to image let's hit generate we can see that the generated images all have similar poses to our original image but the hair color has changed to Pink let's add our easy negative embeddings and try some other prompts for example this time instead of a white background I would like to add some detailed backgrounds and hit generate all right similar poses no glosses and with some books in the background for other image to image options besides uploading a picture you can also do sketch or in paint to repair or restore by repainting obliterated areas this is pretty much it for the basic usage of stable diffusion next we're gonna cover how to train a model for a specific character or specific art style these models are also known as Laura models the internet definition for Laura is a low rank adaptation it is a new technique for fine-tuning deep learning models that works by reducing the number of trainable parameters and enables efficient tile switching so essentially for stable diffusion we are patching the checkpoint models so that the generated images will look more like our character or art style for this part of the tutorial we won't need to use your logo we will be doing everything in Google collab we're going to use this amazing tutorial on Civic AI to train our Laura it's worth noting that since Google collab is an online collaborative service and the maintainer might need to modify or update some of the code inside the notebook it will be perfectly normal if your notebook doesn't look like mine or you run into some errors I didn't run into during recording this tutorial in that case reaching out to this maintainer on the Civic AI post might help cool time to take a look at the data set requirements for you to be able to train your Laura you will need anything between 20 up to a thousand images of your desired character or art style also your images should have some diversity if all of them are close-up shots of the face your model will have a hard time generating the whole body feel free to read the tutorial thoroughly in your free time but I'm gonna just jump straight to copying The Notebook here is The Notebook that we just copied I'm gonna put down Lydia as the product name because I want to be able to generate a model specifically for learn to call RPG's protagonist Lydia and when we run this Cell It's Gonna prompt us to connect to our Google Drive so that it could read our training images and in the cell you can decide whether to grab images from the Internet or to upload them to your Google Drive so you can read them in from your Google home app here in your Google Drive you can see that the first step of the Google collab notebook created a Laura folder and inside it there is your project Lydia and we're gonna upload our training images to this data set folder I have 13 or 14 images generated by Ai and when I was creating a tutorial Quincy asked whether I think there will be an in-breeding effect if the AI is consuming images generated also by AI but I think we'll be okay I'm just gonna give it a few minutes for the upload to be complete and now we're ready to curate our data set here is a cell titled cure your images it will help you eliminate duplicates from your training set however I know that my training set doesn't have any duplicates so I will skip this part all right this next cell will use some AI tools to help us tag our images in other words to generate those keywords we'll be using as the text prompt so like for those Lydia's images we might expect tags like brown hair short hair green eyes one girl solo face focused and so on once this AI Auto tagging is complete the next step will help us curate our attacks This Global activation tag will help identify our lower model when we are using a base stable diffusion model when we're putting the activation tag in the text prompt the stable diffusion based model will know to generate characters or art style that's specific to our Laura let's wait for these cells to finish and take a look at the results here are the top 50 tags from our training set so we see one girl solo brown hair green eyes all those tags that we expect now we run this carrier TAG cell to add this Global activation tag to our many text we are basically ready to go when we look at the extras we can analyze the text and if we look at it we should be able to see our Global activation tags as well as the other tags we identified earlier just like for the last notebook in the project name we're gonna put in Lydia and we can inspect some of the training parameters for example you get a base training model but you can also use your own if you don't like the default and there is activation tag which is the global activation tag that we put in earlier but before we can run this notebook we need to close the other notebook and not just close we need to delete the runtime too this is because Google collab wouldn't let you run more than one runtime if you are using the free version so we go back to our data set maker notebook interrupt any execution that might still be happening and manage runtime and delete this one now we can return to our training notebook there is one additional parameters that we need to think about which is the training steps this will determine how long you will train your model for let's look at some guidelines from the original stupid AI post the guide is saying that too few steps will undercut The Lure and make it useless and too many will overcook it and distort your images so we're gonna find a balance between these in this notebook It also says that your image will repeat this number of times during training I recommend your images multiplied the by their repeat is between 200 and 400 so since we have like 13 images we're gonna slightly increase the number of repeats and instead of saving every airport we're gonna save every two epochs let's leave all the other parameters as default and start running the training training might take some time depending on your configurations my model took like 20 something minutes to train and I will speed up the process just for you all right now we see step 100 done our model is fully trained inside Lydia output we see multiple models ending in safe tensors the number indicates the number of epochs the model has been training for now comes the fun part of evaluating our model by generating some images let me take some time to explain the customizations I put into web UI user.show as we mentioned before generates a public URL exformers according to the docs will increase the performance on certain Hardware clean dark because I prefer Dark theme to The Light theme and the vae path and also no half VA is to prevent some floating Point errors all right now we launch the web UI and wait for it to give us the public URL now the web UI is already as usual we put in our easy negative embeddings then let's first try the Laura that has been trained for eight epochs all right the results are looking pretty decent without any other prompts the Laura has captured Lydia's some character traits for example um short brown hair and uh green eyes however we can do better as you may remember from our training we're putting an activation keyword and now let's put in the activation keyboard to see if that help guides the lower model even better all right now we got the results you might notice that there are a lot of half body Images instead of the more diverse full body ones we saw with the previous prompt before we added the FCC Lydia keyword this might be because our training set has a lot of half body images and that's what this model selected for if you are doing your own character remember to have some diversity in your 20 set poses all right now let's try the model that has been trained for 10 epochs to see if there is any major difference the results are not significantly different but do notice that in one of the images we get some hint of some glasses without us having to enter the glasses as part of the tags now let's try adding a lot more text and being more specific with the model to see if it generates more detailed images all right now the results have a lot more resemblance to Lydia well in lurgical RPG Lydia worked at a cafe as a barista while she was also learning how to code on the site so let's change up the background to see if we can make Lydia barista so Cafe and Barista outfit and the April all right looks like we might be missing the classes so let's try again we can also change our base model so that we get a completely different base art style I have another one and we can give that one a try too all right now the model is fully loaded let's click the generate button well this open EG model is supposed to mimic the style of an easy Journey which is used by mid-journey so it looks like it has a more vibrant art style which people might enjoy better feel free to experiment with different base models and see which one you like the most next we can try adding more loros and generate more complex images navigate back to the Civic website that we have open and we can see that besides those checkpoint models there are some lower models and if we click on those sample images they have the generation data there they have the text prompt negative prompts and simpler number of steps Etc these generation data are a really good reference and especially if you copy the seed to you should be able to get an image very similar to the sample images it's no guarantee that they're going to be exactly the same but they should be very similar alright let's find some Laura to use with the lower models that we have already trained for example I like this Laura model which is trained on black and white manga images and they should give you images a manga like style if you copy their trigger words let's download it and put it in our models Laura folder now on my web Eli if I refresh I should be able to see the anime line art manga like model and let's add it to our prompt we need to add the trigger words too which are line art and monochrome alright we can see that now our images look like black and white Manga style that's pretty cool in the next part we are gonna cover how to use control net to gain more fine grain control of your image generation here is the GitHub page of the control net web UI plugin control net actually gives you fine-tuning power over your images you can fill in your line art with AI generated colors you could just scribble and let AI fill in the rest for you or you could even control the pose of characters in your picture let's scroll through this page and get to the installation parts follow the installation steps and if you run into any security warnings you might need to add enable insecure extension access to your webuser.show and then restart web UI and try the installation after installation just follow these steps and download the models you need and put them in the correct folder once you've done this it's time to go back to the web UI and try out the control net plugin alright so when you expanded the control net plugin looks like this there's an area from two drag and drop images and there are several parameters that you could set so I'm going to turn on low vram and also just enable the control net plugin Pixel Perfect and I have two models I have a scribble and a line art let's redo our text prompt both the text prompt and the control net parameter setting will affect how the images turn out cool so I'm gonna connect to my tablet and just draw something really random on those canvas and see what we can generate from there oh don't play too much bets on my drawing skill I'm just gonna roughly draw a person it's really hard to use the provided pen tool so yeah this is the head those are the eyes and the body oh I hope control net and the amazing AI is going to help me fix my drawing well I'm being totally unnecessary here but I'm gonna draw some ribbons and they of course don't even look like ribbons well we can see that the generated images basically followed all the line artwork I did and building some pretty amazing looking colors it's actually pretty impressive and they even got the ribbons right well that was for the scribble model now let's use some real line art and try the lineup model as well so I do have some promotional artwork I did for learn to go RPG so let's try it now let's configure our text prompt we'll do the regular easy negative for the negative prompts then pick a sampling method and we want the same to be a cafe what the line to be a softer and detailed you might have noticed that we actually didn't put in any request for people to be in the image we'll see what a general image actually has people in it or not and this time we do the lineup model so because we didn't ask for any people to be in the artwork we are only getting images of the cafe background looks like I guess we need to explicitly put in two girls in a prompt and it looks like this time we are getting girls in the images looks like AI has filled in colors for our line art so that was the result of the lidar model the colors are a little bit weird so let's give the AI some more freedom and use the scribble one instead well now the AI has full freedom to imagine the faces of the two girls and looks like the results are pretty decent the images are in square but our line art original picture is in the Square so let's fix up the aspect ratio again with the scribble model let's generate some more images so yeah on the left is my original line art and looks like the AI have generated something that looks pretty vibrant color these actually look pretty good I'm very impressed all right besides control net there are some other really powerful plugins or extensions maintained by other open source contributors I'm gonna show you where to find those so on the Wiki page of the stable diffusion UI get repository we see a page for extensions for example this one looks like it works with control net and this vram estimator will allow you to increase dimensions of your image and batch size until you run out of memory so you can maximize your vram usage this one will allow you to draw some poses this one says it will Target an area to selectively enhance details this one I've seen a lot of people use to generate videos and it works with the control net extension tube this one will help you fine-tune where your Laura is acting on in the image this theme seems to be showing you a custom thumbnail view of your Laura model this one is for localization this one is for editing posts I'm just scrolling through it's a long list just to see what kind of core effects we can have this one converts your images into pixel art style looks pretty cool this one use Transformer models to generate text prompts for you another pixel art and other effects too another prompt generator this one saves intermediate images while the AI is trying hard to generate them there's even this one that will automatically remove backgrounds and leave only the central figures yeah so there are lots and lots of those extensions out there there's a lot for you to explore we can't possibly cover everything in a single course and people also create their own plugins and extensions as they see their new device moving on let's talk about generating images using the stable diffusion API there is this API Doc Page on the stable diffusion repository website and to enable the API we need to turn on dash dash API in our web UI user.shell we can see that there are several endpoints for example the text to image API endpoint image to image API endpoint and when we scroll down we see sample payload we're just put in our prompt and number of steps and other parameters in the payload and it will be exactly the same as if we were using the web UI itself so the idea is that we send the parameter payload to the web UI API endpoint using a post method and then we will retrieve some bytes that we can decode into our image below the developers have provided us with some simple python called snippet that we can use to query the endpoint and get our images and save it as the output.png all right let's grab those code snippet and run it locally all right for now I'm just going to copy paste it and run it directly and I will talk through the code in a little bit once we see the output just know for now that our payload is a simple puppy dog if we don't have some of the modules installed that we have to import we need to install them using pip and just run python3 sd.py and wait for it to finish now it looks like it has finished and we have an output.png file which is a puppy all right let's try some different props foreign we can add negative prompts into the payload too we're on the file again and see what we are gonna get oh it looks pretty futuristic and really cool of course we could append our alarm modeling to it and see if we can get Lydia generated foreign because we have increased the number of steps and we have a picture of Lydia the last five minutes of the video is like a bonus part we're gonna use PostNet which is an app for us to test out API endpoints we are going to look at how the images is represented in the API response and we'll also take a deep dive into the python code and walk through it line by line this is the postman app we're going to put in our request URL there which is the text to image endpoint and we're going to use the post method make sure the body is Json Raw and click send so we get this long string representing our images and then in the response we could see some other parameters that we don't need to pay too much attention to I searched for some base64 to image decoder online and we'll see what it works or not all right remove the quotes remove the starting quilt two and here's our image now let's walk through this python file line by line so here we're doing a bunch of imports we imported Json we imported requests for making a post request and we import a base64 for decoding and we imported some image libraries and we set our URL to be the public instance that our web UI is running on and we create our payload just like this and then we post to our URL and the specific API endpoint which is text to image given the payload and I get the response convert it to Json and remember the images field that we saw that's our image string so we decode it into an image this part here is retrieving some metadata for the PNG for example what parameters are used to generate a PNG Etc and finally we save this image alongside with this PNG info it's pretty straightforward feel free to play with the code on your own you can also make API requests in other programming languages of your choice all right as promised in case you don't have access to a GPU here are some ways you can run stable diffusion on some free online platforms of course there is going to be restrictions like you won't have access to all the models you want you won't be able to upload your custom model and the other people on the internet are also using the servers so you could be in the queue for a long time before your images are generated here we are on hugging face and we will go through spaces and in the search bar we type in stable diffusion and let's sort by most number of likes so looks like the most popular one is the one by stability AI which happened to be the original one so this will be the best option however it shows that it is passed so I don't think you'll be able to use it let's take a look at the other ones that we have access to of course these online models might be limited in that it doesn't have the base model you want it doesn't have the style that you want but this is the best we can do if we don't have access to a local GPU foreign s are a little distorted so maybe the space model isn't great for generating anime images and that's one of its limitations looking through the list we see some image to image some in paintings some photo generic one I think we'll be using the photo generic one let's give it a shot now this one looks like the stable diffusion standard web UI that we're used to and we can see that it's currently live and running on an online GPU they say the model is a protogen which should be a photorealism model foreign looks like they are restraining the number of images we can generate but let's do it and also we will have to wait in the queue so yeah if you don't like to wait or if you really need to use your custom models you should consider getting your own GPU all right finally after waiting for almost five minutes I got my result back it looks pretty good and with that we'll conclude today's tutorial I hope you enjoyed it and I hope to see you in the next video
Info
Channel: freeCodeCamp.org
Views: 128,046
Rating: undefined out of 5
Keywords:
Id: dMkiOex_cKU
Channel Id: undefined
Length: 60min 42sec (3642 seconds)
Published: Mon Aug 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.