SDXL Local LORA Training Guide: Unlimited AI Images of Yourself

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
stability AI released stable diffusion XL it's a generative AI model that can generate stunning images of just about anything today I'm going to show you how to train a Laura or low rank adaptation it's a small file that can be trained to instruct stable diffusion on how an object person or really anything should look you can find hundreds of pre-trained luras on civid AI for everything from animals people and even not safe for work content but what if we want to train your own Laura to create Imes of yourself or really anyone else for that matter if you have a gaming PC you can probably train your own model to produce high quality stunning images just like these to get started we're going to install a piece of software called Kya SS let's get into it Kya SS provides a user interface for you to train and set up the parameters for your own models to get started if you have a Windows machine you're going to need python installed get and visual studio now if you're already running stable diffusion or any other generative AI tools on your system you probably have these installed already if not check out one of my other tutorial videos that step you through the process the first step is going to your command prompt typing CMD will fire that off and then go to the directory where you want to install Coya make sure you've got plenty of drive space here it is going to be pretty intensive from there we're just going to copy the get clone command from the Coya install Direction that's going to clone the repo to a directory called coore SS once that's done change to the K SS directory and run the setup.bat file since we're performing a new installation we're going to select option one this part's going to take just a few minutes it's installing a whole bunch of files and dependencies so just sit back and relax once that's done it's going to ask you which computer environment you're running this machine or Amazon AWS select this machine if you have a multi-cpu or a multi-gpu system you can select one of those options otherwise no distributed training if you want to run your training on CPU only absolutely not it'd be terribly slow especially if you have a good GPU you wish to optimize your script with torch Dynamo no do you want to use deep speed what gpus by ID should be used for training this machine select all which is the default now it's going to ask you if you want to run fp16 or bf16 and this is going to depend on your GPU if you have an RTX 30 or 40 series GPU you're going to select bf16 if you have an older GPU in your system you're going to select fp16 now at this point the installation is done so you can either go to your COA SS directory and double click on the guey dobat file or select option five if you still have your command prompt open as you can see that's going to start an entirely new command prompt it's going to load everything you need in order to start the guey as you can see here on the right hand side of the screen go and close that old command prompt at this time we're done with that at this point it's time to Source some images to train your model with and the important thing here is you want a lot of different variations of lighting facial expression and backgrounds this is going to make the model more flexible in the end for example if you wanted to train a model for Margo Robbie you might go to Google images and perform a Google image search it's important to get really high resolution images so I tend to go to tools and then filter by size for large you also don't want images that have multiple people in it just something else to keep in mind in my case I just broke out my phone and took a whole bunch of cell of myself around the house and outside with a bunch of different facial expressions and lighting environments to get a good mix of pictures and as far as the number of images is concerned you can really train a decent model with as few as 10 images I tend to get anywhere from 10 to 20 for my typical training now normally at this point if you've trained a stable diffusion checkpoint model before you know that you'd normally do image cropping which sets all the images to a fixed size with stable diffusion XL training that's actually unnecessary and in fact you're going to get better results if you don't do that now we'll go back to the UI for COA we're going to open the Laura tab and if you happen to be one of my patreon subscribers I've got this Json file that has all the configurations that you need in order to start training your model it's called sdxl Koya SS Laura config and it's set up for an RTX 3090 that's the GPU I've got running in this machine now at this point you could just get going you wouldn't have any additional configuration to do but of course I'm going to step you through everything so even if you aren't one of my patreon subscribers you can still do this start to finish now under Laura you're going to see this tools section click on that and then click on dream Booth Laura folder preparation this used to be under a tab called deprecated but newer versions have properly move this under the tools section the first thing we're going to fill out here is the instance prompt this is super important and most people get this wrong most tutorials out there are going to tell you to use a random string of characters or something super unique in order to train your model but what this really does is give you less flexibility and worse results in fact even if you're training a model of yourself you want to use another celebrity or someone else some other object that has a lot of images already in stable defusion XL so it knows what to create it has sort of guidance parameters if you will so what I like to do is go to a celebrity lookalike site just drag and drop one of your images there this is going to tell you which celebrity you some resemble or that your character somewhat resembles that's a really good starting place to train your model in my case I'm actually going to type Tom Cruz for the class prompt I'm going to type man because that's what I'm training if you were training a cat or a dog or a woman you'd set that as a class prompt for training images this is where you're going to set the directory that you saved all of your images that you collected for your character earlier now on to regularization images this helps prevent model overfitting and you're going to want hundreds of images here that represent the class of images you're trying to train in our case men these need to be varied and they need to be very high resolution in fact I've already got these uploaded to my patreon for both men and women if you're not one of my patreon subscribers that's okay you can find these databases online or you could even create one yourself for repeats I always go with 20 this is the number of times that each image is going to be trained in the model the final thing you want to do is set the Final Destination training directory this is where all of your output data including the Laura files created by the training are going to end up now we just click the button to copy info to folders tab now when you go back to training and folders everything's already pre-filled the only thing you want to make sure that you update is the model output name so in my case I'm going to use my name which is Brian orlo it and I'm going to do a hyphen tomore Cruz this is so for this Laura every single one of the files is going to have my name so I know what the subject was that I was training and then it's going to have Tom Cruz which is what I know I need to use for my prompt now we need to head over to the utilities tab we're going to have to do some captioning specifically we're going to click on blip captioning blip captioning just uses artificial intelligence to scan the images look them over and then create a text file that has all the keywords that are associated with how the images look it's how you get stable defusion to understand the context and the words and keywords that are associated with each of the images that you're using as your training data go and select your Source images directory make sure the file extension is. txt and for the prefix to add to the blip caption we're going to go ahead and use the celebrity name that we had earlier so in this case tomore Cruz go and click on caption images and then if you load up that command prompt you're going to see that it's going through and it's starting to caption each individual image once it's done go ahead and go back to your training image directory and you're going to see that you're going to have the image and then this text caption file next to to it when you load that you're going to see that it's going to have tomore Cruz or whatever celebrity name you chose to use and it says a bald man sitting in a room with a lamp above him that's not bad but you could add some additional context here that's just going to help with the training so you could say wearing a gray polo shirt with two buttons load up another one here and it says Tom Cruz a man in a blue shirt is taking a selfie not bad but just take a few minutes and go through here and add any additional little context ual elements that you want to add to each of these images just to give it a little more detail about what's going on once you're done renaming all the text files make sure that you select them all copy and then paste them into your Source image directory so that they're there with your initial training images that you're going to use for this now we're going to jump into the mey part we're going to go to Laura training parameters if you're using my config file everything's already set up for you otherwise let's go through each of these train batch size I usually leave this at one this is the number of images it's going to train at one given time it's going to use more vram but it will speed up to training if you do have this at a higher number I just happened to leave it at one for Epoch this is basically a way to split up the training remember how we set 20 repeats earlier for each image if we leave Epoch set to one it means we're going to train 20 steps for each Source image and be done if we have 10 Source images we're going to train 200 steps now if we set this to 10 epics we'll train 2,000 steps and so on I typically just set this to 10 the other thing I do is save every n EPO I set this to one this means we're going to end up with 10 Lura files at the end of this we're going to be able to go through and kind of pick which one looks the best and there's a trade-off that I'll talk about a little bit later between flexibility and precision and I'll show you how to find the best model that you get for caption extension make sure that's set to. txt that's the same thing we used when we did the blip captioning earlier for mix precision and save Precision if you're if you're running an RTX 3090 like I am or a 40 series GPU you're going to set that to bf16 for both otherwise fp16 I don't touch the number of CPU threads per core and then I do make sure that I check both cach latence and cash latence to disk this is just going to speed things up a little bit more for the learning rate scheduler we're going to select constant and then Optimizer is Ada Factor now it's really important that if you select Ada a factor you've got a copy from the description in the video these extra optimized parameters you need scale parameter false relative step false and warm-up initialization false learning rate man there is a lot of information about this online and I trained probably 30 different Laura files just trying different settings and I found that it really doesn't make that big a difference by having a slightly Higher Learning rate it just means there's going to be a little bit more difference between the different epochs or the Lura files that are generated and you might find that you have a a higher quality more trained model at a lower Epoch my case I typically go with 0.00003 and I suggest you do the same for learning rate warm-up we leave that at zero max resolution should be 1,24 by 1,24 this is the default resolution for stable diffusion XL now you can save a little bit of vram here if you don't have an RTX 390 or a 4090 that has 24 GB of vram you could set something like 768 by 768 it's going to save on vram and let you train a little bit more efficiently but the trade-off is the images that are generated are going to be a little bit lower quality enable buckets make sure this is selected this is very important this ensures that you don't have to crop your images it doesn't matter what resolution vertical and horizontal they are it's going to take those in and use them just fine for both text encoder learning rate and unet learning rate you're going to set both of those to 0.3 just like we set the learning rate to earlier check the box for no half vae and then Network rank this one's a little bit more interesting Network rank increases the detail retained in the model but it also increases the size of the Lura file that's generated higher numbers here are going to have more detail better color better lighting so I usually go with 256 for the network Rank and one for the network Alpha but just be aware that every one of your Lura files that's generated by this and there's going to be 10 for this particular run are going to be about 1.7 GB in size now if you don't have much vram or you want smaller files you could train it something like 32 for Network Rank and 16 for Network Alpha it's going to be a little bit lower quality model but that might be okay depending on what you're training and what you're getting after we're going to scroll back up to the top and click on Advance scroll down and make sure that you check gradient checkpointing cross attention should be set to X formers and then don't upscale bucket resolution just leave it how it is and once you click on start training you can go ahead and pull back open that command propped window it's going to show you the progress in my case since I'm doing a video right now I can't run this at the same time because it'll use too much memory but it uses about 20 gigabyt of vram and it's going to take about 10 hours because I have 40 files if you have 10 images that you're training on it should take about a third of that time and keep in mind too if you tune down the resolution or you change some of those other settings that I mentioned earlier you can get this to run on about 12 GB of vram although 16 is probably preferable once all of that's done it's time to load this up in automatic 1111 or whatever other stable diffusion image generator software you use first thing we want to do is make sure we select stable diffusion XL base 1.0 and then we're going to figure out a prompt that we're going to insert I like to go over to civid AI just find a random image that I think looks cool and use that as sort of a baseline so we'll go ahead and select the prompt here we'll paste it into the prompt and and then at the very end we're going to go down to this Laura section we're going to find Brian love it Tom Cruz we're going to find the first one and we're just going to go 1 2 3 4 five 6 7 8 9 10 that's right we're going to select all 10 Laura files and I'll show you why you can see that that adds each of the Lura files up here to The Prompt we're going to go ahead and rightclick those and click on copy and then we're going to delete all but the first one now the really important thing is we need our keyword trigger here for our Laura so so where it says close portrait of a man I'm going to delete a and I'm going to say close portrait of Tom Cruz man since that's our prompt trigger that we set when we trained this model I'm going to crank up the sampling steps to about 30 then we're just going to generate an image that's not too bad for a first attempt but I'd really like something that's facing forward a little bit easier to see so I'm going to go ahead and generate another one also be sure to set your resolution to 1,24 by 1,24 this is a pretty good result nice looking image this is what the first first Laura that we trained now I'm going to show you how you can see the comparison side by side of all 10 of your Laura files so down here we're going to select this sort of recycle symbol that's going to set the seed to this image seed so that every single one of the images we generate here in a minute is going to use the same seed for script in the dropdown we're going to select XYZ plot and for the X Type we're going to select prompt Sr now remember we going to have you copy all those Laura files that we had up in the prompt earlier now you're going to paste those into the X values over here and in between each one you're going to put a comma until all of them have a comma except for the very last Laura file what it's going to do is it's going to look for this very first Laura image and then it's going to replace that in the prompt with the different Laura files for every single image generation so when we click on generate it's going to actually generate 10 images horizontally along the xaxis and each one's going to use a different Lura file you'll see here in a minute all right that takes just a few minutes but as you can see it produces all the images side by side it's a really cool way to just kind of take a look at all the different Laura files and the images that they generate the other cool thing here is you can sort of think about this as a Continuum from the left to the right the Laura files on the right hand side of this are going to produce really high quality kind of really close to the original image you can even see that elements of the shirt sort of change it looks more like my gray polo that I was wearing as it gets more to the right on the opposite end of the spectrum on on the left here you're going to get Lura files that are really flexible so you might be able to get more artistic Freedom if you wanted to create an anime version of the person or some sort of crazy hair something else that didn't really exist in the training data you might have better luck using one of the Lura files from farther to the left than those farther to the right for me I typically find a good mixture balance of flexibility and precision is somewhere around Laura three or four for my own personal uses let me know what you find when you train your own models also let me know in the comments below if you have any questions or anything else I can help out with otherwise I'm Brian love it this is all your Tech AI we'll check you next time thank you so much
Info
Channel: All Your Tech AI
Views: 86,244
Rating: undefined out of 5
Keywords: stable diffusion, stable diffusion tutorial, sdxl 1.0, lora training, stable diffusion xl, stable diffusion xl 1.0, sdxl lora training settings, sdxl 1.0 lora, stable diffusion ai, sdxl 1.0 lora training, lora sdxl 1.0, lora sdxl 1.0 training, kohya lora sdxl, sdxl training, sdxl training dataset, kohya lora sdxl 1.0 training, sdxl lora training, kohya stable diffusion
Id: y2J7EZUk_a0
Channel Id: undefined
Length: 17min 9sec (1029 seconds)
Published: Wed Jan 03 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.