Stable Diffusion Lora Training with Kohya (Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi Seth here and welcome to the channel this video is a bit Technical and I will try and simplify it as much as I can and give you some tips and tricks as well I wanted to make a unique character based on my colleague in stable diffusion so I played around and came up with this image generated in sdxl and I was able to create multiple Styles like this art inspired by the game Red Dead Redemption or a Harley Quinn cosplay even a da Vinci style talk drawing a digital cyborg art and I could do all this using the base SD XL checkpoint with my own trained Laura model so let me show you how [Music] before I start I want to thank the user the 3D Sphinx who requested this tutorial it was a learning curve for me but yeah I managed to learn and train my own model and am here to make things easy for you so I would not be using automatic double one double one for this tutorial I will be using the kohai UI it works better and there are certain parameters that enable you to train a model with a low vram GPU if you are new and don't know what I am talking about and want to learn I have a playlist on the channel named stable diffusion which would be an excellent place to start I will start with installing kohaya for Windows and won't be doing a collab tutorial I have a 4090 so some of the settings are fine-tuned for the 24 gigabytes 3090 and 4090 cards however I will also tell you how to set it up with low vram this is the GitHub page for kohai UI I will leave the link in the description the first thing you need to do is scroll down the page for installation the prerequisites for installing this are very simple install Python 3.10 and remember to tick mark at python to pass during installation next you need to install git if you are on the latest version of Windows 11 then I don't think you need any separate install of Visual Studio redistributable if you don't have it use the link to install foreign if you have an Nvidia 3000 or 4000 series card you need to install version 8.6 dlls for the Cuda deep neural network so scroll further down and download the zip file from the link given the this will speed up the training process drastically I have created a folder called Laura training in one of my drives I would be installing kohaya here the first thing you should do is right click and open a terminal then paste the git clone command to clone the GitHub repository now close the terminal we need to extract the zip folder downloaded earlier and put it inside the newly created kohaya SS folder foreign folder right click and open the terminal type in dot backslash setup dot bad and press enter this is just for the initial install and configuration later on you can start kohaya by clicking the gui.bet file in the main folder now type 1 and press enter this will install kohaya here go with torch 2. just wait for the installation it takes some time after the installation is completed we need to configure accelerate manually you can use the up or down arrow keys and hit enter for selection for the environment select this machine now select no distributed training keep typing no until it asks you about the GPU ID for the GPU ID type all however if you are on a laptop type 0 for the Mixed Precision if you use an older Nvidia GPU select fp16 as bf16 is not supported if you are on a newer GPU like the a100 with 3090 or higher select bf16 as it offers better stability during training and that's about it type in 5 to Launch kohaya there is no one formula or set of parameters that I can tell you to use it all depends on what you are training the model for for checkpoint training you need thousands of images for your data set it is much easier to train a Lora model based on a pre-existing checkpoint in this tutorial what I am going to do is train a character Laura model based on the sdxl1 checkpoint I will explain the most important things you need to know in the simplest possible way so it is easy for you to get started let's go to the Laura tab first the configuration file is a DOT Json file that saves your parameters and settings from the model quick pick you can choose a model from the list you can choose any checkpoint model downloaded from civit AI by selecting custom I would be using the sdxlv AE fixed checkpoint I will leave the Civil AI link for the model in the description for all models either base or checkpoints fine-tuned over sdxl you need to tick mark SD XL model before we go into the parameters we need to prepare the data set the instance prompt is basically a keyword that would trigger the Laura you can put anything here for example I am going to put here Ctrl alt AI the class prompt on the other hand should Define your subject or Style in my case I am using a portrait image of a real person so I will put the class prompt as just woman training images are the most crucial part your trained model will be based on these images around 25 to 50 images would do for a subject or an object you would want to go a little higher about 100 to 150 maybe more for Style to be honest I have not tried training a style yet the important thing here is that the images are of one extension whether PNG or jpg this should all be in the same format the ratio of the images doesn't matter however the images should be a very high quality I mean if you are taking images of a real person the facial details should be sharp and clear from my testing if you do not get the desired results it's primarily due to the lack of variety and the quality of these images if you are training a real person you should have different angles of the person again it depends on what you want to do in my case I am just using the face if you want the AI to generate a far away image of the person then it's better to have some full body length images in the data set the same goes for any subject object or even Style say you are training a model for birds depending on whether you want one species or different species of birds you should include the bird photos and all angles accordingly birds flying doing other actions still shots and so on coming to regularization images they should represent the subject object or style class my style class is a woman so my set of regularization images should all be of women there's one trick here since my subject has curly hair it would be better if all my regularization images of women are with curly hair how many regularization images do you actually need well different people will say different numbers however from what I have tried and tested it should be approximately two or higher than the training images I cannot Define a specific number how do we generate these regularization images let me show you a simple and easy way to do it let's open stable diffusion the best way to do this is to use the same checkpoint model to generate the regularization images I will use a simple prompt portrait photo of woman curly hair and generate the images in batches I am creating three folders to keep things organized the first folder will include the training images and the second will be the regularization images the last folder is the training directory I already have the images ready just copying and pasting them into the relevant folders okay foreign I will now select the respective folders the repeat value represents the number of steps in other words it means how often each image is put in the vram for training it would be best to be careful here because there is an epic setting which I will explain later the total number of steps I found best for my case is somewhere between 1500 to 3 500. for now remember that I have 54 training images and if I change the repeat to 50 it means 2 700 steps only for the training images I also have to calculate the steps for regularization images as well when regularization is used multiply the steps by 2X so it will bring the total number of steps to 5400 which I don't want so instead I will use 30 steps I am calculating the steps because there is something called overtraining and under training the Laura model I can only give you a reference range here because it actually depends on how you want to train the model the quality and the variety of images it's about finding the perfect balance you will lack the desired results if the model is over or undertrained after selecting the relevant folders click on prepare training data what this does is it creates folders in a specific format for the training it puts the training images in this folder and renames it with the steps and the trigger words as defined it does the same for the regularization folder you should now click on copy info to the folder tab everything is set correctly in the folders tab I will now rename the model after which I will explain the parameter settings I will explain only the essential settings you need to tweak before training your model you get a wide range of presets to select from we will ignore the presets for now let's start with the Laura type I train my model with a standard type I don't have enough data as I have not done rigorous testing for each type but feel free to experiment I suggest starting with the standard type for now the batch size is how many images you want to put into the training at once I have tested this and two is an excellent value to start with a bat size of two will train two images at a time simultaneously for character training anything higher might give an accurate results this is because when different images are learned at the same time the tuning accuracy of each picture drops if you are training a style model a higher value may give you better results again this all depends on your data set for my data set I found the best results at Value one another thing to note here is that when you increase the batch size it divides the total number of learning steps by the batch size for example my dataset has 3240 train steps increasing the batch size to 3 would result in 1080 training steps also the higher the batch size the more vram is consumed take note of this if you are on a low V Ram GPU one Epic is one cycle of learning this includes the repeats and the batch size for any Laura two or three epics are good enough I just went with the default value again because when I increase the value to 2 the training steps were at 68.40 and the results were inaccurate with my data set this is why I gave the range of 1500 to 3500 for the character training model for a dream Booth checkpoints these values are way higher save every an epics will save the Laura model for each epic number as defined for example say your epic value is 12 and you define to save every three epics what will happen here is that kohaya will save the model file four times this is an excellent way to test and compare the results at different epics and use only the best model by the way you can easily test and compare these by using the XYZ plot under scripts in automatic 11 11. as mentioned earlier if your GPU is compatible go with bf16 otherwise stick to fp16 for mixed and saved precision I have an AMD 7950x3d CPU input a value up to 70 to 80 percent of the number of logical processors you have if you are training on an sdxl model change the resolution to 1024 by 1024. the learning rate is a bit complex it's the rate at which the Laura learns the dataset during a specific training run let me explain with a simple analogy say you are learning to ride a bicycle on a path with twists and turns and smooth stretches of straightness the learning rate here would be the rate at which you pedal if you pedal too fast you might Ace the smooth part but when it comes to the twists and turns you might crash or Miss Vital Signs telling you which way to go in the same way a high learning read would be that the Laura might miss details from the data set however see you piddled very slowly adapting a low learning rate you would be safe but reaching your destination could take a lot of time in Laura training the learning rate helps you find the just right speed for pedaling sometimes you might need to Pedal faster when the road is smooth and other times you might need to slow down for those tricky parts this way you can reach your destination safely but not take too long learning rates and Laura are basically written in scientific Notions I will explain how to calculate the exact decimal point in a bit to understand it completely you need to understand the learning rate scheduler and the optimizer which I will explain in the end but first I want to talk about the text encoder and unit learning rate these two values are separate from the learning read values when you define them they take precedence the text encoder learning rate is usually set lower than the unit learning rate this is because the text encoder is a relatively small and essential part of the model and training it too quickly can lead to overfitting overfitting means the model will perform well on the training data but poorly on new data that it has never seen before the unit is a larger and more complex part of the model and it is responsible for generating the image from the text embedding that is why the unit learning rate is set to a higher value than the text encoder learning rate the default values are 5 times 10 to the minus fifth power for the text encoder and 1 times 10 to the minus fourth power for the unit calculating this in decimal points is very simple just follow what I am doing on the calculator foreign foreign text encoder learning rate for the sdxl training is 4 times 10 to the minus seventh power an sdxl specific parameters make sure no half vae is Tick marked now to network dimensions the default value is 8. during my research I found out that 32 is also very good however after extensive testing and research I highly suggest keeping the value at 128 for my data set it worked great to put it in simple terms let's say you are building a Lego model the Lego bricks are like the network dimensions in Lora training the more Lego bricks you have the more complex and detailed your build can be but it is harder to build if you only have a few Lego bricks you will be limited by the number of bricks you have so you may not get the design perfectly as you want but it will be easier to build less pieces less time a higher value in network Dimensions means you will have more control over the stable diffusion model but it will be more difficult to train a lower value means you will have less control over the model but it will be easier to train I decided to try three values which were 8 32 and 128. for me 128 worked best but I realized the hard way that there is a correlation to network Alpha in the sense that the network Alpha value is relative to the value of network dimensions so the first four models I trained were with the network Alpha value of one Network Alpha is referred to in technical terms as a dampener it controls how much the Lora model is allowed to change the stable diffusion model at each training step dampening slows down the learning process this prevents the Lora model from making too many changes to the stable diffusion model at once which can lead to unstable results the default value is one which is pretty low if your network Dimension value is set at 128. I tried setting both values at 128 and got my desired results so technically if your dimensions are 128 you should keep the network Alpha value at 32 or 64 to get more control results and then go higher up if you set it at the same value as the network Dimension it prevents the dampening but may result in uncontrolled results in simpler terms if valora has too much effect set a lower Alpha value relative to the dimension value if the Laura has too little influence set the alpha value closer to the dimension value or at the same value I recommend ticking memory efficient attention if you do not have enough vram I trained the model with this option ticked if you are too low on vram tick marked gradient check pointing and samples you can write a basic prompt and Define to save a sample image every insteps or every an epics it is expected for the sample to be weird in the beginning and then turn out fine at later stages as the model is learning this would slow the training as it would have to generate sample images during the training process you would want your model to understand your data set quickly and then decrease the learning rate gradually to pick up finer details an optimal training process does involve a variable learning rate adafactor optimizes the learning rate according to the situation of the vram and it pairs very well with the adafactor optimizer constant does not change from beginning to end cosine has a very high learning rate which decreases to zero as the number of steps increases cosine with restarts is like a cosine curve but then it hard resets and starts the process again from a high learning rate to zero multiple times throughout the training to Define how often it does this you can set the value in the LR number of Cycles option if the number of this option is two or greater the scheduler will run multiple times during a single training run the default is blank which equals the value of one linear is similar to cosine but the reduction towards zero is in a linear line rather than a curved one also linear will start at the learning rate setting polynomial is way more intense than cosine and you can set it so that it's hard to reduce to zero I would use polynomial when the image data set does not have many distinct features to be learned and at the same time it is more efficient than constant there are two options for polynomial one you can input a value in the LR number of Cycles option like cosine if the number of this option is two or greater the scheduler will run multiple times during a single training run the second option is the LR power if you've set the power to 1 then the scheduler has the same curve as the linear scheduler for my data set I used constant with warm-up scheduler it starts with a learning rate of 0 and gradually increases towards the learning rate set value during warm-up and it continues to use that rate during training the LR warm-up option is applicable when you select this scheduler I went with 100 value here as I wanted to increase the learning at one hundred percent of the steps during warm-up by applying this method it allows the model to learn the intricate details from your data set these details may be lost during training if the warm-up is too short I found this perfect for sdxl base and character Laura training the list of optimizers is vast use Adam W 8-bit if you are low on vram it works great and is accurate enough however if you have the extra vram go with the atom W which is 32-bit adafactor adjusts the learning rate appropriately according to the progress of the learning when you choose a defactor the learning rate setting is ignored if you go to utilities wd-14 captioning there is an option for you to automatically caption a description for each of the images and save it as a text file now it automatically captions the description as per the photo however you are supposed to fine tune it manually I have not done this for this tutorial because when training a real person captioning is really not needed however for checkpoint training or other subjects or styles of Laura you may need to do this process before starting the training I want to show you a trick to get your max training steps before starting the training go to data preparation scroll down and click on the print training command then check the command prompt here you can see all the details and how kohaya calculated the max training steps let's start the training and then I will show you how to use the Laura and a trick with the CFG scale the training takes a long time it goes way faster with atom W 8-bit optimizer let's open Automatic 1111. I have inputted The Prompt and settings I want to show you how the CFG scale can be used here to fine-tune your character notice that I have to use the trigger word Control Alt AI woman to get the trained character also the Laura with the weight one after much testing I found that the CFG scale of 14 gives me the best likeliness to the original trained character instead of retraining the model with different parameters however this is prop specific this is how I achieved consistent results across various themes and Styles as showcased in the video's opening if I keep increasing the CFG scale it will overdo the character and if I decrease the CFG scale the character won't be what I have trained in the Laura at all the fact that this is happening means that the model was trained properly and not over or undertrained note that the CFG scale basically weighs in the prompt and since the prompt uses the trigger word Control Alt AI the weightage of that trigger word also gets weighed in there are many practical business applications for training your own Laura and this can stay within your company internally or if you are a freelancer within your computer from the gaming industry to Fashion conceptual Art Product designing interior designing and Medical Imaging to name a few you can enhance productivity in your workflow by training your own Laura model it's just a learning curve and once you get the hang of it you are set and done I hope you found this tutorial and the tips helpful we really appreciate all the likes and the subs thank you for all the support given so far until next time [Music]
Info
Channel: Control+Alt+AI
Views: 12,484
Rating: undefined out of 5
Keywords: lora training, stable diffusion lora, sdxl lora training, kohya stable diffusion, kohya gui stable diffusion, lora 7gb training stable diffusion, kohya, lora training guide, easy lora training, kohya lora, kohya ss, kohya ss lora, kohya ss install, kohya ss guide, how to train a lora for a charater, lora kohya, network rank, training parameters, kohya lora training, kohya tutorial, kohya training, lora training stable diffusion, kohya gui, install kohya, scheduler
Id: _F39RbO3tYo
Channel Id: undefined
Length: 32min 40sec (1960 seconds)
Published: Fri Sep 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.