Training LoRA with Kohya (theory included!)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone today we want to generate a Laura model by fine tuning a model using Korea what does fine tuning means means generating a new model from an already trained model to improve its performance or to perform a specific task differently from training a model from scratch where you need to use thousands or more of images fine tuning allows us to generate new Styles new objects new subjects in your image or video just using a few images if you don't know what Alora model is I made a video already on what they are and how to use them and the dimension of the Laura model is a way smaller with respect to the base models like stable diffusion version 1.5 and all the other checkpoints and this means that also training will be quicker so what's really important for this video is that by the end of it you will be able to generate your Laura model I will show you which parameters are used for my training but this can be different for your training right so what I want to do it's instead of giving you just numbers I would like you to explain you how the main parameters work in a neural network which is stable diffusion meaning that I want to give you the tools for making a decision on which parameters to use for your training for training the model we are going to use Koya you can find Koya in this Gita page the installation process is super easy and then we are going to use stable diffusion for testing our trained model once we initialize Koya this is the main interface and here you have different tabs right you have Dreamboat Dreamboat Laura dream book textual immersion fine tune and Utilities in today's videos we are going to use two tops we are going to use the Dreamboat Laura for setting our parameters and training our model and then we are going to use the utilities tab useful for captioning our pictures our training images so the first step is to decide what to train the model on so let's talk about images we have two type of images we have the training images and the regularization images the training images are those used for training the model so the the model will learn different features different characteristics within the images the regularization images are used for the training not worth it so while the training images needs to represent what we are training the model on so in my case it will be my face the regularization images needs to represent the class of the object subject we are training the model on and in my case will be woman for training a Laura model we don't need many images as training images so we just need for object a subjects between 5 and 25 images but if you are training for a style you probably need more something around 100 images or even more quality of the image is very important so it's better to have less images but of good quality rather than having you know a lot of images but with very low resolution the only requirement for the training images is that they need to have the same file extension which can be PNG jpeg jpg it doesn't matter if they have different dimensions or different resolution the model I'm going to use for my training is this one which is realistic Vision version 2.0 and from Civic AI you can see what was the base model used for fine-tuning this model and in this case is stable diffusion 1.5 so I have this folder here with all of my images I'm using 25 in total and as you can see I have different Expressions on my face this is because I want the model to learn a different facial expressions so if you don't want the model to generate all of your images looking towards the camera you need to train the model using images where you're looking on the left on the right up down and what is important is also that you wear different clothes and you have different backgrounds it's not required but it's really recommended that each image as a caption a text file connected to each image describing that image captions can be created manually if you have just a few images otherwise you can use Koya and if you are in Korea you need to go into the utilities tab here and then you have captioning and then you have four different apps where you have different techniques for creating your captions I find bleep captioning being the best so let's copy the directory of this folder we go back to Koya and we just paste it in here I'm gonna use blip captioning you can use whichever you prefer and I'm gonna paste in here so now this is my directory to that folder caption file extensionist text file which is fine you can decide whether to add something before the description using this prefix to automatically caption or something to add after the description in that case you will add something here this is completely optional right I can for example add a photo of I'm not gonna add anything in the postfix then I think that's it really like you have these other settings which are more technical apart from Max land and minimal length is just how long you want your description to be but this beam search and top P are more technical settings beam search is an algorithm for getting the best output so I would recommend it to using enough to change the other parameters so once you have all set you just press on caption images and if you go into your terminal you will see that it will start loading blip caption and it will start creating this text files it shouldn't take longer for 25 images it took me three seconds four seconds and then captioning done we go back to our folder and you will see that you have a lot of text files next to your picture so let me take this picture for example I'm gonna take the text file corresponding to this picture and you will see it says a photo of a woman in a red hoodie looking at a cell phone obviously I'm not I wasn't looking at a cell phone but just had my eyes closed so if you want to improve your captioning you should do that obviously when you have a massive database you are not able to do that but maybe you can just you know tweak the description of just a few images so the most important I would say but what I would suggest you is to not rush so when you're training your model just be patient and try to make everything as perfect as possible so in my case I'm going to I did change all of the descriptions for all my pictures they are just 25 so it didn't take a long time to me so what I did I just open the text file and I was like okay I'm not looking at to a cell phone so I can remove this I can add eyes closed I can add head Bend It Forward I'm going to add ponytail and I actually have a red hoodie so I'm I'm happy with that and I can add wearing earrings so once I've done this I'm just gonna close it and save you don't have to describe each features of your face so for example I am brunette if I want the model to generate all of the pictures exactly like me with brunette hair I don't have to specify brunette because the model is gonna learn the Laura is but if I want the model to generate myself with blonde hair then in that case I need to specify that so in that case I'm going to write a photo of Laura brunette and then the rest of the description something that you want to describe is also the background so if you have I don't know plants or Kitchen on your background that's important to specify that into your model so once we have our folder with our pictures and captioning we need to create the regularization images regularization images are optional but also in this case I would really recommend you to use them they really make the difference in most of the cases even if you are just training a face as I said before these images should represent the class of your subject object or style in my case it would be woman how many regularization images well you need to have plenty of them so if you have 25 images which is my case I would recommend you to have at least one or two images so for each training image and then to multiply this by the repeat number you are going to use repeat it's one of the parameter you can choose during your Koya training and it's very important because this is how many times each individual image training image gets put into vram and in my experience this should be around 100 so it means that each image gets put into the vram 100 times so if you have 25 images and 100 repeats it means you need to have at least 2500 regularization images now obviously these are a lot but I have a trick for you to generate these images quickly so the trick is to go into the realistic Vision which is my model my base model I'm going to get the generation data for one of the images I like I go into stable diffusion I'm going to paste this generation data in the positive prompt I'm gonna press this button and then what I'm gonna do I'm gonna change this positive prompt because I want to generate just the class representing my training images which in my case is woman so what I will do I will remove all of this useless descriptions and this one as well and I'm just gonna type woman that's it I'm gonna remove the high resolution fix and I'm gonna use 512 by 512 also for the regularization images so I'll change this I'm going to use a random seed because I don't really need to use uh you know a specific seed all the time otherwise I'll get the same image all the time and then let's press generate quickly oh very important obviously is that you are using the model you want to use as a base one which in my case again is realistic vision and so you will see you get a picture of a woman now you can generate a lot of images in a loop so you just need to right click on the generate button and then you can see this generate forever so this means that as soon as one image is generated it's gonna generate another one like in a loop so if I press this you will see now this will start up and you see it doesn't stop it just keeps generating images and then the the nice thing is that if you add features different features to your prompt like for example blonde This is Gonna Change so in this case I don't want all blonde so maybe blonde is not ideal I can maybe put American or european you know just to have a little bit of diversity but once you've found a good prompt you can keep this Loop generation until you generated the 2500 or more images and then once you're happy with that you can stop by right clicking on this same button and then you have cancer generate forever so this will stop the generation so I generated all of this so this is my regularization image and I generated 3475 and you can see that I have diverse images of women now I know these are a lot but I would recommend you to clean them a little bit meaning that I don't know if you have I had for example some images where they were just women eyes for example I don't need that I I need the entire faith of a woman so in that case I was I just canceled this this pictures and um it's fine if you have some which are not perfect and or they are low resolution it really doesn't matter so now we have created our training images folder and our regularization image folder so what we can do we can start adding these folders into our Korea let's go into Koya and let's go into Dreamboat Laura The Source model tab is the first Tab and is very important and is where you are going to choose the model you are going to use as a base model you can quickly pick the model from this drop down list you have some models in here already like you have stable diffusion 2.1 your stable diffusion version one five version one four if you want to use a different model you will need to type the path to the model in here before doing that I would like to show you that you have here two different checkpoints you have version two which you need to tick if you have a version 2 stable diffusion model and you have B parameterization which you need to tick if the model was trained on 768 by 768 Dimensions if you pick the model from this drop down list these two check box will be automatically updated so as you can see in this case in stable diffusion 2 base it's a version two so this was ticked but it was trained on 512 by 512 so this wasn't ticked now if you're going to use your own model you will have to be aware of these two checkbox because they are not automatically updated so what I'm going to do I'm going to click on this folder and I'm gonna dive into the folder where I have my model I have it in my stable diffusion models stable diffusion here you go so this is the folder where I have all of my stable diffusion models so I'm going to select the folder and then you need to add the name of the model you add a backslash and then you go into stable diffusion web UI you go into models well you obviously need to go where your model is right so I have it in here stable diffusion and then I need to copy the name can go into properties and I just copy the name on my model go back here and I paste it in here now I know that the realistic vision and you can see that from the main page it was trained based on the stable diffusion model 1.5 so you don't need to tick these two box because episode before if you choose stable diffusion 1.5 these are not ticked so I'm gonna now change this again so I'm gonna paste mine and I'm gonna leave it like that so this is the first tab we need to fill then save train model as a safe tensor safe tensor is completely fine usually that's what Laura models are are not checkpoints actually checkpoints and save 10 Source are the same thing is the only difference that safe 10 swords are safe as the word says so it means that they cannot contain bad code then after that we go into tools which is where we are setting up our folders so let's do that first you have training images here so you need to click on this button and then you dive into the folder where you have your images with the captions and you select the folder you do the same for the regularization images which I have on my desktop select folder and here is the path to this folders and then you need to also create a folder where you will want your model to be saved after training so I'm going to create a new folder in here I have different training as you can see I'm going to create a new folder and I'm going to call it training 13. you can call it whatever you want to right so I can go here you click in this folder again or you copy and paste directly the path or you just dive to the to that folder using this you can do both so what's easier for you and you select folder actually I picked here the wrong one so instead of images there should be image because it's where I have my captions as well so I'm just gonna you know change this to here you have the instant prompt and the class prompt class prompt is woman so I'm going to type here woman if you are training your model on cats dogs you have cat or dog if it's a toy you're gonna write toy if it's an anime girl you probably want to type girl one and then here you have the instant prompt and it's basically the name of what you're trained and in my case I'm gonna use my my name which is Laura what seems to be very good is to use some numbers in the name so for example instead of typing Laura like this I can change I can swap the a with the force so like Laura like that I don't know why but this seems to be working better with the model so the instant prompt plus the class prompt is what we call the trigger word usually so once you have trained your model and you want to create the picture of yourself you will need to write in the positive prompt a photo of Laura woman we have a look at this later anyway then you have repeats we talked about repeats before so this is how many times the image is put into the model right we could call this iteration as well I found that a good number of repeats is between 70 and 100 so I'm going to use 100 in my case because it was giving me better results but I would recommend you trying different number of repeats as well and then I have plenty of regularization images and I'm going to use one for them that's really fine if you are not using regularization images obviously this repeat number it doesn't matter really so once you are done with that you just need to press prepare training data and then you will see if you go into the folder you created which was training 13 for me you will see that there are four different folders inside where you have images for training the model if you click inside that you have Android which is the number of repeats and the trigger word which is Laura woman right and if you go inside you have the images with the text file great then you have the log file which is the folder where the log file will be saved after training the model so it's actually empty now you have model file again also this is empty as well because it will be filled after training with the model trained and then you have reg which is the folder for the regularization images so if I click here I have one which is again the repeat number and woman which is the class prompt and I have all of my regularization images in here once you have this ready you can go back into Koya and press copy input to folder tab then you swap from tools to folders Tab and you will see that this tab has been filled with all of the you see all of the path to the new generated folders which is great so everything is automatic the only thing I would suggest changing here is the model output name which is simply the name of the model you're going to create to generate and I want to call it Laura face underscore face once this is done so we feel the source model folders and tools we need to go into training parameters which is the most important tab because it's where we are setting our key parameters I'm not going through all of these parameters but I'm gonna show you and explain you the most important ones but before doing that I would like you to understand how a neural network works because when you are training a model is very important to understand how a model works and just in in that case you will be able to choose what's best for you right and you can understand as well how you can move the parameters how to change the parameters in order to get an optimal trained model so a neural network is a Computing system inspired by the human brain and it's made up of layers and they work together to make sense of data the layers can be divided into input layer multiple hidden layers in the Middle where all the calculations and the magic happen and then you have the the output layer which is obviously the final result now let's imagine that you're training a neural network to recognize a cat so what you will do you will start with a large collection of cats which is input inside the neural network each image is divided into pixels so once they are input inside the neural network each layer between the input and the output layer we'll try to identify features inside the pictures the images and the deeper the layer so the higher the number of layers inside the neural network the more the model will be able to identify more complex patterns inside the images for instance the first layers will be able to recognize edges of the image and the deeper you go the layers will be able to recognize the ears the nose of the cat and so on and so forth training a neural network requires a lot of Trials and errors so initially the network will be not very good in recognizing cats but every time it makes a mistake it's going to provide a feedback and it's going to adjust the network based on that feedback this process of adjusting is called back propagation so the main goal of training a neural network is to minimize the difference between the predictions from the model and the actual outcome now let's go a little bit deeper so do you remember all of the images of cats we are going to input inside the neural network for training it on cat images while you can do that one by one so inputting one image every time inside the neural network training the neural network the neural network adjusts itself and then gets another image and so on and so forth instead of doing that so one image at a time you can decide to do in batches so dividing the entire database into groups up and input it into the model not one image but a group of images this helps the model to be more efficient when learning what's in the images and is also quicker why is more efficient well imagine that again you are training a model on a cat right and you have a cat in a specific position if you give to the model just one picture of that cat the model will learn that the cat in that position in that exact position is a cut but if you shoot the model another cut in a different position you won't understand that there is a cat so that's why we are giving a batch a group of images because we are giving a sample with a cat in different positions so the neural network is gonna take the first batch of images it's going to process them it's going to generate an output and this is called forward propagation and then it's going to adjust the parameter internally to reduce the error and this second part is called back propagation so it's going to take the first batch it's going to forward and back propagate then it's going to take the second batch is going to forward back and propagate until the last batch the process of going through all of these batches is called Epoch so this is what an epochi is different from the epoch the iterations or steps refers to One update of the model's parameters so for example if you have five batches in a single Epoch the number of iterations will be equivalent to the number of batches that the data is divided into through many epochs the network gradually gets better and better and we want to make the smallest possible number of mistakes and in technical terms we will say that we want to minimize the loss function so if we look at this picture and we see this orange one is the loss function the process of training the neural network involves finding the lowest possible point in the function and this should be the global minimum which is the lowest possible in this case is this one right but many times it's very difficult to get to the global minimum so many models will get to the local minimum which is still a low point in the function y is not the lowest so in each iteration the neural network will adjust its mother me there to take a step towards this minimum right something like this going down the hill and is doing that in steps how big should be the steps is determined by the Learning rate the learning rate it's a value between zero and one and sometimes can be tricky to find the right learning rate because if you have a high learning rate it means that the network will change a lot during each training and it can be that you will never find the lowest minimum because this step after a certain point will be too big to achieve the minimum it needs a lower step if you choose a low learning rate you probably will need more epochs because it will take more time for the model to get towards the minimum so what you need to get it's like a middle point between a too high and too low learning rate so keep in mind that if you're using a lot of ebooks probably you will want to reduce your learning rate whilst if you're using a high learning rate you probably want to use lower number of ebooks so now that we know how a neural networks we can go into koyagan and set our training parameters we have trained batch size which is the number of batches again as I said before I have 25 images I tried using five batches so having five groups or five images and this was giving me quite good results but then I found that using a trim batch of two was giving me best results so that's what I'm gonna use now probably I can show you my my results afterwards number of ebooks I found that number four for me was working pretty well not changing the learning rate the learning rate as I said before should be between 0 and 1. this is the default learning rate which is the 0001 and I find that this learning rate with Epoch 4 works pretty well if I was using ebook of 2 for example I would have probably increased my learning rate slightly mixed Precision I would recommend it to use it if you can this is to train the model faster using less memory there are two types of precision of lower Precision there is float 16 and B float 16 and each of them takes 16 bits of memory Nvidia GPU can run operations faster in float 16 whilst tpus can run operation faster in B float 16s I'm using Windows now I'm going to use fp16 we discussed about the learning rate you have here the learning rate scheduler you have different schedules here what is it this is simply a mathematical function which adjusts the learning rate for each iteration so if you use Cuisine it means that the learning rate is going to increase according to the cosine function now if you're fine-tuning a model this doesn't really matter much so what I would recommend is just to use a constant learning rate so this means that the learning rate is not going to change in each iteration if you are training from scratch a model then I think that the scheduler will be very useful but for now I'm just gonna use constant I'm fine tuning this doesn't matter much the learning rate warm up allows us to start the learning rate at different steps different time of the training also in this case given that we are fine tuning a model in my opinion this is not relevant so I will just use zero then you have Optimizer the most common one is Adam so if you open this you have many of them plant you can try different options but again what I would recommend you is to use Adam and you have mainly two atoms I would use Adam W and Adam w8bit if you have time for training your model I will go for Adam W if you don't have time I will go for Adam w 8 bit they are basically the same but Adam W 8-bit is going to trim faster and this means that the Precision of the final output won't be as good as Adam W but you are saving time I'm going to use Adam W one important one actually is max resolution so this has to be co-event with the images we are using for training the model and in our case it's 512 so I'm gonna keep it like that and that's it so once you are done with that you just need to press train model and this will start and you will see I'm gonna press it now quickly and you will see inside your terminal how everything is going is going to tell you how many train how many steps how many iteration how many ebooks it's gonna give you a wrap up of everything which is quite useful and you can follow along the the training usually it doesn't take much time like maximum 25 20 minutes in my case or obviously depends on the power of your computer but yeah lower models should be very quick because you're training with the just a few images and the output is gonna have a very small Dimension okay I'm gonna stop my training now because I trained already my model and I'm going back to my stable diffusion and I'm gonna test I'm gonna show you my model trained let's refresh it I'm in text to image I'm going to pick the model I use as a base for training my Courier model and then what I will do again I will pick the generation data from one of the images from civit AI I'm going to paste it in here and I'm going to press this little button then I'm going to remove all of these descriptions which I don't want to keep I'm going to just keep you know this um light and quality and resolution description and I'm gonna add the trigger word for my Laura model which is uh photo of Laura Laura I brought it like that with the number if you remember and then I have Woman as a class so you always need to specify that yeah and then you need to link the Laura model inside the positive prompt once the model is trained with Koya if you go into the training and model folder you will have the all of the models saved I have these three because I was saving my model after each ebook and I had four Epoch and the final model is this one Laura face this is the final model after the for epochs then what you have to do you need to drag and drop this file or copy and paste this file inside your stable diffusion web UI models Laura I'm gonna show you where so I have stable diffusion web UI I have models here I have Laura and then here is where you have to drag and drop and copy and paste the the final model you trained I have many here so when you are into stable diffusion you can press this for opening all of your Laura models and if you click on on the Laura you created I'm going to choose this one Lara phase number eight which was the best in my case you have the link to your Laura and you're actually now able to you know generate your first picture with your face and then I press generate and this is my this is my well I look she looks like me so um I find it's quite cool so for example we can change the prompt as well to have some more um something more detailed in the description so I have included ponytail let's see what happened so in interior should have the half on the back you can also add closed eyes maybe let's see what happens because I was training the model also for having eyes closed yeah here you go and it's working pretty well because sometimes when you when you use normal models and you type eyes closed it usually doesn't give you eyes closed and this is quite good I would say or if you've write profile look into the right and I remove closed eyes let's see what happened maybe I can add eyes closed in the negative prompt [Music] maybe we cannot restore faces as well yeah and she's definitely looking to the side now so yeah that's uh that's how it works as I said initially I didn't want to give you just number like this but I wanted to give you more information on how this neural network works so that you can decide what's best for you and you can tweak you know all of the key parameters for getting the best result sorry if it was too long but I hope it was useful so see you at the next video bye

Info

Channel: Laura Carnevali

Views: 59,174

Rating: undefined out of 5

Keywords: stable diffusion, stable diffusion v2, diffusion, ai art, diffusion model, generative ai, generative art, stability ai, ai artist, imagen, nft, install stable diffusion, install stable diffusion on mac, install stable diffusion apple silicon, apple silicon, stable diffusion on m1, stable diffusion with python, stable diffusion hugging face, stable diffusion github, stable diffusion v1.5, stable diffusion tutorial, lora, lora training, lora model training, lora model, lora sd

Id: xholR62Q2tY

Channel Id: undefined

Length: 36min 18sec (2178 seconds)

Published: Thu Jun 22 2023