Training a LoRA Model of a Character| LoRA training Guide | stable diffusion Koyass A1111

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello this video is about a training a lower model for stable diffusion we will explain how to produce a perfect solo model for a character for a real person that has almost 100 percent resemblance to the original subject drawings they don't they are not based on high resolution uh my training data set does not contain Square images so the the lower model can actually crop the images and actually produce perfect results even for faces so let's discuss this and the difference between using a small data set and a large data set and the settings that can help us produce these results faster okay because this training data set that I'm going to use that will produce this perfect results actually contains 330 images we will see the difference between using 21 images and the use of 330 images for the training and the kind of results that are produced so in order to train allora we need to follow these steps first of all after installation of koi SS from from the website we prepare the data set the data set is the most important part of the Laura training all the remaining settings are less important so for example the network Dimension the alpha value the captioning all of the all of these things are less important what matters the most is having a good data set a good data set as a data set for example for a person it should have different bosses different lightings different body shots for example face pictures body pictures from front from from back from side so we should have some balance in the picture in the data set for example we don't need many faces for example only for example three to four phases only is enough but we need many body shots if we want to have good body shots if we don't have large number of data of body shots for example of full body shots of half body shots we will not be able to gain to get good results after this point we captioned the we caption it set up the folders set up the training data train for a couple of ebooks then compare the ebook space based on initial results which are produced by koi SS then we do the comparison using stable diffusion using XYZ comparison and select the model that is the best model the best model is a model that can generate the same person with new clothes new colors and with high level of similarity okay in some cases our data set could only have 20 pictures for example or small number of pictures this is why if we want to have good models good model for for body shots for example we need to to create the initial model using the initial data then generating new pictures from the existing model and augment the data and add these new pictures to the original model so for the till we get for example 100 pictures or 200 pictures or 300 pictures when we have large number of pictures the Laura will be able to train on this data with high level flexibility and great results now if I want to download real pictures from the internet for example I use download albums for Instagram which allows me to download an entire album from Instagram this is generally not okay for commercial uses it's only for educational purposes 12 while 7 68 okay the 512 by 512 should be separate like this and the 512 by 768 should be separate so we can use bulk okay and then save as in general I actually prefer to use a software that exists in Microsoft Windows which I downloaded from Microsoft store which is which allows us to resize sets of images very fit Define the dimension for example 512 by 768 or 768 by 768 which will resize the image the way we want now to start the training we run we run koi SS Laura graphical user interface application which will run in the browser like this here now initially it's in dreambooth we would go to dreamboost Laura but first we have to prepare the images for example usually we have to caption the images captioning can be done using wd14 captioning or blip in general these are the most common methods blip is often used for people wd4 is used for cable and Anime however I have seen that wd4 is more used by the stable diffusion models so I prefer to use wd-414 we will select the folder for example select the folder here for the images that we want to caption then we remove undesired tags what does that mean we don't this is optional so we don't have to do it however it's better to to include the paths that we want in in the Laura to be part of the lower for example in all in all of our images we have one girl we already know that we have a girl so we can remove that it's also we also want to remove solo for example because we don't want to these captions to repeat in all of the images for example if our images don't have uh the back view for example we can remove the lips the nose okay so because we don't want these things to change okay anything that we want to change for example the hair if we want to change the hair color for example okay it must remain in the captions okay if you want to change the address it must remain in the captions so we don't want to add however if we want to choose the same here we would use for example brown hair because okay we want the Laura to have brown hair all the time but if we want the hair to change Wiki we we keep it in the caption the complex data sets we don't use and we don't really need to do this because the training will be much better if we left every all the details so the model will be more flexible so we remove undesired tags and run the captioning once the captioning runs it will tell us that captioning is done software could actually say that this is a man not a woman for example okay or make mistakes so it's it's a good idea to come back and double check the captions one by one if we want to produce perfect results however often the automatic settings will produce good enough results after this point it's possible to go to dreamboost Laura now we prepare the settings we select the the checkpoint the checkpoint if we want for example to use chillout mix We Will We select it however uh because the Laura will rely partially on some of the ways of this model so a lot of that is produced by chillout mix is unlikely to work perfectly on real realistic vision for example so if we want to train on the stable diffusion tool we have to check V2 and the parameterization we can also prepare the folders the preparation of the folder can be done by setting the class prompt for example our case is a woman that's the class the instance is the Olivia for example or Olivia caster we select the training image and we set the regular regularization images now this can be done manually actually because I usually do it manually because it's it's faster and I always use the same folder okay we Define the number of repeats that means how many repeats each image will have in every ebook number of bits for regularization is often one so for example if we use a regularization image image what's our regularization image regularization images are basically images generated by the same stable diffusion model or or any other model actually in general but it's better to be produced by the same by the same model for example like here this is this is this is the same set of image that I use for a person or for almond for example which are sets of sets of images generated by stable diffusion so they are basically random images which you can generate just going into stable diffusion just to write a woman and it will produce you can set the batch number the batch count to produce for example 400 or 1000 images and collect them together into one set and remove the redundancies sure so it's better to remove redundancies from this but it's not very important it's generally used to regularize that training which will reduce the overfitting it will make the model more flexible it's it's not obligatory so you can also train a lure without using using regularization images however in my tests I have seen that regularization images can actually reduce the overfitting effect and increases the flexibility of the model so it's better to use regularization now further preparations we can use for the preparations like this we need three folders class images and log or use this this setting we just try Define the training images and regularization images we set the Olivia for example Olivia Casta version one for instance woman Define the number of repeats and it will give us produce the results in a certain folder then we copy the info into folders tab to get to the folder or do this manually in class folder we Define a folders for example class we bought one that's how many Rupees we want to train on each regularization images now an image we Define the number of repeats first in my case here because I have a small data set we will use 40. if I have large data set then 10 is enough underscore the name of the model or the trigger word then space woman which is the class name if we are training a man we use man if we want to try it we are training man and woman we use this person if we are using an object we would say for example an object or in general okay after we're done here we have to go to to the settings here we Define The Source Define the folders which are images the model the log and regularization sets this is regularization okay we Define the model name output which is the name of the file that will produce at each ebook and a safe tensors format for example in the training parameter settings keep the defaults for the lower type We are continuing a previous training for example if I run for four ebooks and I wanted to continue or if I run for five ebooks and I found that let's test another Epoch we would come here and choose the last generated ebook for example so it will continue the training from here it will load the weights and resume the training the batch size depending on the computer specifications if we have over 8 gigabyte RAM with good graphics card it's possible to use two three four Etc the larger the faster the better however if you have a limited vram for example 80 gigabytes for under then using one is more than enough for the Xbox we Define the number if it looks for example 10 we might break the training in the middle if we found that the results produced are overfitting for example and I will explain this is how this is done for graphic cards with RTX for examples better to USB BF but not necessarily not necessarily we could use FB I use it for because I have RTX it's the default learning training the learning rate it's better to use to keep the default number we can increase it as well for example if we are using batch of two if we have two here we can use two we have four here we can use four keep it one for example here keep the default regarding Network weight now in training people I've I've tried different settings I've seen that the most effective one is 64 by 32 or 128 by 32 however 128 for 20 images is just too much okay actually only 16 is enough but usually we would network and in the alpha we bought half of the value uh I've seen that 64 in general produces good results so for a small data set 64 is more is more than enough we don't need 128 to train a set of data which are just 20 images that are three megabytes in size so it's not realistic even stable diffusion is is fit inside approximately four gigabit Network neural network and strained on hundreds of thousands of images so if we have like 20 images even 16 or 8 Network dimension of 8 is enough for this data set maximum resolution now this this will be read automatically we will enable Pockets why do we enable Pockets if we have different resolutions in the data set we enable pockets for example we have some images which are 512 by 512 other images are 512 by 768 so we enable box which is better if all our images are 512 by 768 for example we don't need to enable pockets now in advanced settings we don't need to change anything basically okay it's also good to have exformers installed and used because it makes the training faster without exformers the results could be slightly better but it's basically almost unnoticeable flip augmentation this means that the picture will be flipped and this will allow to augment the data set and can also produce better results in some cases however this will make this the training much slower a lot more slower so Eclipse skip the eclipse skip basically will remove the first two layers from the neural network of stable diffusion then trained so it will make training faster if we use trip Eclipse Skip One it will be slightly slower but it can produce slightly better results as well so it's good to use clip Skip One for people uh in general we keep it by default tool now one a very important thing is to use sample images images config this is very important and very necessary in my opinion because it will allow us to generate images for example after each couple of steps or after each epoch for example if we want to generate after winning Epoch we'll generate this prompt and it will give us an idea if the current ebook is actually starting to overfit if the image starts to break it means that we need to stop the training we don't need to proceed till 10 a books in some cases for example we might need to continue to 20 a box or or even more regarding the prompt that is used we should use a simple prompt like this okay so it's a general prompt and we should use address that does not exist in the training data set for example if I was training dataset does not contain yellow shared we will watch yellow shared so because if we bought yellow energy produces blue shaft it means that it's overfitting because it has seen in the training data set that it only has blue shirts for example okay so we need to put something that does not exist in the data set to make sure that our model is actually flexible after that we press train model and start the training and so we come up after a while we see that the model is generating a file for each ebook now when the network weight is for example 128 the file size will be 144 megabytes so we track the progress of the trading and we see for example here that at the nice ebook the loss is dropping down from 0.85 0.8 so when the loss usually when the loss drops significantly that means that the model is actually overfitting starting to over over fit in general when the loss drops very quickly initially for example we see that the loss is 0.1 20 0.96 okay 0.1 that's that means that okay things are are heading in the right direction but when the loss starts to drop significantly for example 0.90 to 0.85 that means that's most it's most likely that we are starting to overfit okay uh the second thing that the sample folder will contain the images for each generation captions at all because the data set is very large so I expect that the the training will go just fine even if the captions are not worth 100 perfect so we will run the model once again using a new a new name for example version V1 the same model because we want to do the comparison the same class the same settings I will use 10 a box okay it's possible to use because I'm going to use a large data set I will increase this value up to 128 and see the results is two now the batch size is two apoc is 10 for example now sometimes we might get errors because we see that we have three thousand thirty three thousand different steps if we get error we can reduce the batch size down to one here choice for example the number of training images by the repeats the a box r10 we have a very regularization images along with the regular regularization images we get total of 33 000 steps okay now in the training results here in this window I I trained the different models for example using the same window it's very important sometimes I think it's useful to check the settings after running the model because sometimes you might be making a mistake okay let's compare the different models that have been produced using this is Olivia test and copied the air box that I want to compare between each other now in this version version 0.1 which are which is the small data set this is the large data set which which has a network dimension of 128 and this is the 64. Network Dimension model so we will compare this models using stable diffusion using XYZ comparison now in general when we want to prompt okay if we provide more details into a prompt it's able to to create more accurate results for example like here okay so the picture will be more accurate you can generate another picture and check the results we can see that the generated image is very similar to the the present that we have trained if we reduce the prompt then the image generated will be slightly different it will be very close to the Target but it will be slightly different this is why prompting actually matters in the production of the results okay so it's still similar to our to our original images for example this is Olivia so it's still very similar to to the person but if we don't include additional details their similarity will become less because because when I when I captioned the pictures I I added the complete details and I did not edit the captions okay it's because I wanted as much flexibility as possible so that I it becomes possible to change the hair color for example to share it to change the eye colors everything basically everything okay now if you want to compare the different models I have gener so this is the conclusion the conclusion is that increasing Network Dimension will make the layering faster uh it's very important to have large metabolic Dimension if we are training large data sets we also need a quality data set this is the most important part of the training of Laura so having a quality data set will produce quality results that can change the clothes change the hair change the color of the oil change all the details in the picture this is it and have a good day
Info
Channel: How to
Views: 127,826
Rating: undefined out of 5
Keywords:
Id: clRYEpKQygc
Channel Id: undefined
Length: 21min 58sec (1318 seconds)
Published: Mon Jul 03 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.