LoRA Clothes and multiple subjects training for Stable diffusion in Kohya ss | Fashion clothes

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello this video is about Laura training of clothes okay the principles applied in closed training is similar to person or any type of object in general with mine with minor differences now what I want to train is for example a set of clothes for example this clothes which are on a mannequin so this is a plastic mannequin and I want to produce results something like this okay so we had a mannequin before this is American and this is the result so our mannequin is removed and the clothes are being addressed on another person for example another subject that I will train is a genes I will strain both genes and okay this is the genes that I'm training okay so this is a genes on a person and I want to train it and see how it can be dressed on another person like this for example okay so I will train two subjects in the same Laura and discuss the differences between using regularization and without using regularization and the some recommendations regarding this training process so the first step is to repair the data assume for example that we have genes and a blue address so I want to train a laurel for this stress and for these genes for example okay so we should scrub some data for example if we have a club store we can take pictures of a mannequin wearing this dress from different angles okay that's also the possibility is real people in the training now it's very important to note that Laura will learn everything in the pictures so it will learn the money you can and address as well but if we able to address more often it will learn that rests better and faster and it's more difficult to learn and the facial features because they are more difficult than address so what we first do is to prepare the data for example this data we need to use Photoshop for example if I have a set of pictures like this we can cut the pictures for instance like this for example we would Center it and use something like unlock here remove background so the background is removed I would press C to crop I was you I would use the ratio two to three okay and Center the object that I want to learn and press enter okay now I can press Ctrl alt shift W to save this image we would choose a good quality and export and save the data so we apply this for each of the pictures after this we can also because we only have four pictures for example we need to augment this data what I would do is to create uh one by one size one by one and crop the head for example like this because I don't want the head to appear in all the pictures if the head appears and all the pictures it will learn the head faster so I would generate more pictures by In This methodology so by cropping it's also possible to rotate okay the rotation can also augment the data okay and the help the model to learn more poses and this is just an example of data preparation so assume that we have prepared the data we have four pictures of this type these are two by three or two by three not three by two size dimension now these are square images now like I've said it's it's actually better not to show the face in the pictures but for my data set is very small and the picture quality is limited so then we would go to pyrami for example now the more pictures we have the better in my case I don't have many pictures so here we would use 512 by 512 so pme is used for example to prepare the data to get something like 512 by 512 512 by 768 Etc so uh I will scrub the images down to this size then save as and store them in this location now next I will go to the second that data set which is like this I want this to be 512 by 768 so I would save this as well extract this and combine them in one folder so once we combine them in one folder we can start the captioning so I'm only going to use 12 images like I've said it's better to increase the number of images by having rotations stuff like this and reduce the number of images where the face actually appears because I don't want to learn the mannequin very quickly very quickly I don't want to learn it at all if possible we apply the same thing for the second data set for the genes we get something like this for example now we can see that the genes in the genes in case we only show the face twice now this is a real person now because we the face only appears twice it will be learned it won't be learned in comparison to the genes which appears in all the pictures so my subject must appear in all the pictures while everything else must not for example if I have a white shirt and all of the pictures it will also learn the white shirts as well okay now we copy this data for example we put it in the training guess folders now like I've said in the first in the first video we have a class folder image folder model folder in the image folder this is where we locate where we place the folders that we want now we can use number of steps for example 10 let's assume that we want to name our targets XYZ blue jeans okay we place the pictures here now then we train another subject okay now we can train multiple subjects in the same Laura by having a full load for each of them we could also use different number of steps for example we could use five or eight or use even 10 XYZ blue dress XYZ blue dress XYZ blue jeans for example because that is slightly more meaningful the second data set for example is here now this is my second data set I place it here large icons so in general I don't recommend training more than one subject in the same Laura because we would have less control over the parameters and sometimes a certain subject for example this subject could require 1000 steps to train while this subjects would require 1500 steps to train for instance so it will not be fair to have the two subjects in the same Laura and expecting both to converge properly using the same number of steps okay so it's better to actually train each subject in a different Flora but I want in this example to to show a case where we can actually train more than one subject so normally for example in this picture we only have two faces so it will require less number of steps to train so I can assume for example that we can use eight steps or eight repeats sorry eight repeats where their image here I would use 10 repeats now why do we don't use something like 40 for example now you can test this and you will see that if we increase the number of steps to 20 or 40 or 30 it will converge way much faster and we might skip the writer box so it's better to actually reduce the number of steps and increase the number of ebooks so that we know which is the best eBook to use okay so all always use a lower number of steps um now for training subjects such as a clothes for instance now the principle is the same okay Laura will learn everything that repeats in the image so it will learn this button it will learn blue jeans with torn parts for example so that which repeats it will be learned but it will also learn everything else for example it will learn in the face but depending on the number of steps it will learn the phase uh much slower than the genes for example now on the other hand when it comes to uh to the address and now we have more faces here so it will then learn the faces much faster so like I've said it's recommended not to have too many faces in in that dress or uh blur them the other faces you can actually blur them and Laura will still be able to learn that this is a dress that is addressed by a woman for example or by a person because stable diffusion has already learned the concept in general which is the concept of clothes and people now it is possible to use classification and not to use a regularization set now what regularization set will help is it will reduce the effect of the person for example so what kind of a classification image I would use for in this case you we have to use people wearing the dresses okay so I don't want to use to learn this concept the concept of the woman so my classification image for example it which it should be ideally a woman okay a woman wearing a dress okay so because I don't want to learn that phase that exists in my data set so I would use new faces in the data set so it will basically learn some of these features and it's also very good to have for the classification images to actually be of good quality because the better the classification images they are it's more likely that the Laura will be will be better as well so it's also possible to use images with with deformation that's not very important because it will not learn this these images as good as it will learn the Laura so the purpose of the regularization is to disrupt the learning process so assume that we will be using regularization we can also I will test regularization are reduction visualization and see the difference now it's possible to learn the concepts in both cases okay so now our data set is ready now we can launch we can launch ballora the koi SS we can launch the koi as graphical user interface so that we can start the training process the first thing that we would do is to caption we would go to utilities captioning I will use wd14 caption because it is used more often and the stable diffusion a checkpoint now we choose the folders that we want to caption and the first thing is the blue jeans I would select now because we have multiple Concepts it's better to actually add a trigger word because each subject should have a different trigger word the trigger word could also be considered the folder name itself okay now if I have as one subject it's not really that important to have the trigger word because the instance prompt name is the trigger word itself so for example x y z blue jeans now I want the blue jeans XYZ blue jeans I would add coma now I should ideally remove everything that I want to be part of the Laura for example uh blue jeans for example don't want to say blue jeans uh but if I want the genes to actually to change the color okay it might be better to you know to keep that inside to the Laura okay but it might not always work as we expect for now I will leave it as it is now if I'm creating a character we should actually remove one girl solo like I've said before realistic please don't keep spaces between the images realistic uh looking at viewer Etc anything that repeats the eye colors for example brown eyes uh the hair color black hair for example so anything that I want to be part of the Laura I should remove it from that desired and desired tanks now I can remove the tags later using a special uh software that can double check the tags and remove them okay we will see this step so we choose and then we caption the images we check here and wait for the captioning to complete now the captioning is complete and now I should review the captions because for example there's torn pants torn jeans I don't want these to be part of of the captions because these are parts of the Laura okay because I don't want to repeat these words when I model when I prompt okay then secondly we would choose the second data set which is the XYZ blue dress now we just change the blue dress and we caption the second data set as well and wait so once the captioning is complete I can go to a special software such as Borough for example borrow dataset manager tag manager now this software has some issues some problems but it's still useful now I would go to file load folder now I'll choose blue jeans for example folder select I can see here in my data set now what I don't want is these two term Parts bands and twin jeans so I can remove them from all of the images at the same time by going to show all tags why do I want to remove them because I want the torn bar to be part of the Laura okay so when I remove them it's better I can keep them as well but it's better to remove them I want to remove torn turn and keep XYZ flu Gene XY blue jean or let me name it rename it and blue jeans for example remove and add I want to bit at the top XYZ blue jeans okay now if I come to each tag we can say that XYZ blue jeans now if I don't want if I don't want to change the color of the clothes I can also remove blue pants for example okay anything related to the character I can remove it for example it's color blue bands float rice Etc now however if I want to change the color of the dress it's preferable to keep uh the color inside inside the Laura okay now on the other hand it's because I know this is a band so I can remove it genes for example I can remove jeans uh what else it's I think it's fine like this so we can press save now sometimes we get error here if we don't get any error that would be great I would usually close and reopen because this software is not very stable now I need to load the second folder the second folder which is the blue dress I would apply the same thing here check here x y z blue now there's a typo in the blue here so we can correct it by removing it then adding it once again at the top x y z blue dress okay now it's added to the top now it's a blue dress I can remove the Blue Dress actually because I know it's okay I can also remove the dress because I don't want to dress because I know it's a dress I would keep blue dress if I want to change it to Red for example but if I don't want to change it it's better to actually remove flute rest because it's part of the Laura as well now we look at it it looks acceptable we save all now our data set is ready now in my data sets now it's captioned and brunch so this is brewing so what we do in Brewing is actually remove the captions that belong to our subject okay anything that relates to our subject should ideally be removed but it's not necessarily okay now captioning is always better than not captioning and it's always good to actually remove the features that we want now the data set the data set is ready I can go once again to the Laura I can go to the Laura tab please don't confuse Laura with stream Booth here now what we would do is choose usually I use a configuration file to speed things up or do the settings all over again so configuration files can help us repeat the same thing without a without having to input each parameter every time so what we'll do is open for example and just load the configuration file now the the model that I will use is Photon for example is possible to use any kind of model that's totally fine from 1.5 Etc but for example can give some really good photorealistic images for example so we would choose the the checkpoint from this location then we check the folders now it's possible to run with the classification images and without I think it I will test for classification and without initially I will run with the equalization set now why do we use regularization regularization is Just Like A disruption of the training process like I have said before now it's also possible not to use regularization Let's test it without regularization first Xyz clothes now if I have one subject I would name it related to that subject I don't want to spend much time on this video now parameters I would use standard I often actually use licorice law cone because I I saw that it produces better results but since most people use standard I will use standard now the only difference between local and the standard is just the convolution and the network ranked so in low current we would reduce the network rank to under 32 for example use Network Alpha 4 convolution for one now for the standards I would use basically the same settings I'll adjust you now when I'm training a subject that has many images for example hundreds of images I would use 128 and I would use lower Alpha for example 64 or 32. now when we bought that if we are training a simple subject such as Globe which has 10 images or 12 images it's better to have 32 maximum okay there is no need to have more than 32 Network rank okay we can use 32 16 or 8 that would make the file size smaller and I think that is much better too much freedom for the Laura is not really that good because having a larger Network rank it will it will help the model learn more features but it can also make it a overfit faster now regarding the settings I would use training patch one because my GPU is limited now if we use more than one it's recommended okay it's okay to use more than one if you have a strong GPU regarding the a box usually we don't really need that many ebooks we can or we I will use 10 airbox because this is a simple subject of a clothes so I will stop in the training when I see it starting to overfit save everyone a book now the mixed precisions I will keep it BF because I have RTX graphics card now the seed is very useful actually because it allows you to compare different models so if I want to compare this model with another model it's better to use the same seat okay for example I could also use this seed once again in stable diffusion to generate the images that were generated by the koi itself so the scene is used actually in many locations and the Aquarius was really good to keep it now I would use the same default settings of cosine and learning rate the cosine will start going down to zero it's really useful there's no need to to change that these things they basically produce similar results I tried different results and I've seen that it's cosine constant to produce good results and they are fast enough they are faster than other Factor some might work in better better than others depending on the metal and the data used so there's no rule of thumb here I would also use the optimizer Adam W8 bit which is really fast enough and produces good results it's very reliable now if I used other factor for example here I need to put other Factor here as well now I will keep the defaults which are really good enough now the maximum resolution in my data set it's 512 by 768 Now by usually the query will actually read this value from the data set so it's not really not necessary to put it but it's it's a good habit to actually put this data now enable buckets when we have different result pictures with different triggered Solutions such as in my case because I have 512 by 768 and by 512 so I would enable Pockets now there's a very important note about enabling Pockets I will mention it right here okay Network rank 32 we don't need more than 32 even with two subjects with when it comes to clothes because we only have only have a couple of images in the advanced configuration I would keep everything as it is now I will keep clip skip one because it's a realistic model now it's possible to use these methods okay uh color augmentation and flip augmentation now if the colors of my subjects are not really that good I might use this it will add random saturation or the random Hue to the images but okay I could also use flip augmentation what is flip augmentation for example when I look at this image so sometimes it will learn it in this method in the other hand it will flip it horizontally so it will land like the same subject water twice okay and not twice but uh for example in this ebook it will learn in this position in the second topic is ebook it will notice the like in the opposite direction for instance so it will randomly flip the image horizontally this is like an augmentation of the data set itself so it can be useful for images that are symmetrical so if my subject is symmetrical it would be okay to actually use the flipped augmentation that might produce better results even with people for example but this will make the training take slightly more time so I won't use it okay now one important thing I think is actually to to have the image generate a prompt so write this part literally always write it because you really want to see the output of each ebook so that we know when to stop the training for example we don't really need to run for 10 airbox most of the time sometimes we only need to run for three ebooks so we stop the training we don't need to continue training if the pictures start to over overfit or becomes or become overcooked or or or broken okay so what we would do is Masterpiece because we have two subjects we should actually uh right what we want to display XYZ blue dress because if we don't post Bluetooth will come with something random because we have two subjects in the image we make that prompt very simple simple negative now if we remove this we can see this is the default so realize the same default basically we just add the subject trigger trigger word for our data set now we make the image 512 by 768 because I want to have something like a full body workout boy shot okay now once we are complete okay 10 like I've said let's change it actually it changed to a nine without my intention this is why this is very useful to actually review now regarding the folder name now because I'm doing I'm training a genes so we can actually put the class name as genes which is also a subclass of bands for example or a subclass of clothes so a class name here would be clothes for example or just jeans okay or pants foreign for example because pants is more General than jeans for example now in this class is also a clove because it's addressed so the class name is actually dress okay so we should ideally put the class name address for example here and genes in this location before starting the training we can start the training and check the results okay like I've said this is a simple subject so we might only need approximately 1000 steps this might different from one subject to another so we need to test to see actually how many steps we need okay but usually we don't really need too many steps for assembled subject with limited number of images and now during the training especially to check the more pictures from the model folder sample folder we can check the picture is generated we can see that the addresses looks like our model now we can see that the face here is not the mannequin's face okay it's just one girl because the face does not repeat in all the images in the second picture Okay this looks more like I will dress but not that much because we have this extra now in the third ebook we can see that this is our address so our Laura has learned the model but still the the person who is in the picture is not a mannequin so this looks good now in the fourth however now we can see that it looks more like a mannequin starts to to converge into American so in the fifth ebook we can see that that this is some kind of overfitting because it has learned the the mannequin okay now like I've said it's better to have only like two or three mannequins in the picture and cuts the face okay it's better to cut the face sometimes so I would say that a box 3 looks good enough for example okay now we can stop the training and the hope that the genes were learned properly because we could not see the genes in the same command so I will stop the training now in the second module I will use the same things but I will also include uh regularization set to see the results with the regularization it's also it's always better to use weather regularization and without now I will use the virtual V1 reg and keep everything else the same just change the model name and add a classification in my classification set like I've said I have a woman with dresses Okay and the jeans inside the classification they were set should ideally not have torn genes so the classification for example I want to have something for example like yoga pants for instance okay because it's different than the genes so it's better to have different genes and different types of dresses it's fine if some of the dresses are similar to some extent for example a 20g is like this I would actually remove it that would be much better okay because I don't want to have same genes in the classification set okay it's better to have different types of genes different types of bands than my training okay that would be much better okay the classification set can be different sizes for example 512 768 512 by 512 if there's a nudity we should remove it okay because uh we are training clothes so we don't really need any type of nudes and the pictures so once we are satisfied with the classification sets we can run the training again using the same settings okay so we have a classification set and this is the new model we train the model first thing we see that the number of steps have doubled with the regularization this is a two factor regularization that means the number of steps will double now we could also check how many images are being loaded and how many that subsets will be trained now with the equalization the training will take a longer time okay but the results could actually be slightly better so we can see that the number of steps that have doubled you can see that we now we have subset zero subset one subset two we have three subsets one for my first the subject the second for my second subject the third is for the regularization set now regarding the regularization set it will only use a limited number of images which are 12 plus 14 is calculated uh 26 multiplied okay sorry 12 multiplied by 10 that's 120 plus 14 images multiplied by 8. so that is 223 so it will only use 223 regularization images we add to that 26 and which will give us 200 and 258 so what will be loaded in the cache is actually 258 images that includes the 232 regularization images and the 26 images that we are training so similarly we will wait for the output and see compare the regularization set and without using regularization so we come after a while after 20 minutes for example and check the resulting epoxy the output from each epoch okay we can see that Eva coin is learning the concept this is not American having Asian faces that mean that's most likely because the classification set has more Asian faces that's also possible all because of the SD we can see that at Epoch 2 it's learning the button in another Direction okay better ebook 3 or block 4 is better ebook 5 okay that is on the right direction but slightly different okay we can see that at a box 7 it's starting to change now based on these results this actually looks slightly better because it resembles the subject more so we can see that the regularization was not a great success it's acceptable it produces good results in general but the regularization set itself okay can affect the output and may not give us the exact results that we want we will stop the training for now because I don't see any points because right here we can see that the face is starting to look more like the mannequin and we are starting to get it to get extra things on the address so the maximum four from ebook 4 to 6 look acceptable so we can test them further using stable diffusion in automatically 11 11. now for testing using automatically 11 11 I often use basically the same prompts that exists here with slight modifications okay so this is the prompt we would then put the Laura which is X Y Z clothes version one okay we would use the same negative prompt it's a very simple prompt so it's fine put the steps of 30 for example we'd use we want full body shots for it for instance 768 now we can change the prompt slightly for example the young woman wearing a blue XYZ address Cowboy shot looking at in the simple background you can also for example without simple background and a straight for instance and a street in a street so it's very important to do the conversion using XYZ plot it's also very good to use after detailer most of the detailer does is that it fixes the faces automatic in paint I've explained this in another video so instead of a cowboy shot I would use posterate I can use the same Laura or remove the Laura and rely on on the faces that exist in the stable diffusion itself looking at the viewer in a straight for instance we don't want in the street because in the after detail I just wanted a portrait so I don't care about that dress at all I just want a portrait for you for young so what we would do is go to XYZ then do comparison between the ebooks for instance I have this one I want to compare it with a box 4 for example and ebook 5 then I want to check the regularization rig starting from ebook 4 for instance then ebook 5 then ebook 6 for instance then exploit box 7 Etc now it's also sometimes I started just with a couple of ebooks if I saw that for example I start with three and four if I saw that four is okay I go to 5 because the testing can take some time I also check regularization for example four five six then if ISO that six is okay I'll go to seven and so on sometimes the results that we obtain using stable diffusion could be better than the results that we gain using the koi SS now if we want to get the same results we have to fix the seed using one because this is the seed that we have used in koi SS but I don't want that because I want more flexibility in the model now this is the comparison between the various epochs and the various models next I want to check for example changing the colors so I said that I have already left the blue inside the captions the reason for this because I want to check if I can change the address for example so I would change red I will for example emphasize the rate more then I want to for example a change the model itself so instead of x y z instead of the blue dress I want to also include the blue jeans so this is how XYZ comparison is performed we choose instead of the cdbc we choose search and replace session replace session replace and we Define the row is columns and Which models we want to test then we start running and see the outcome okay now I will run this in the background because this can take slightly a little bit of this can take a couple of minutes but in general if you want to check one test for instance we just put something like this and see the results for the epoch 3 of x y z blue dress we can see that from the first attempt it has a we have obtained a perfect result now this pouch that this was actually a mannequin now it is a real person or AI generated real person okay now I will go to to test XYZ and show you the final results immediately now I have actually generated the different results initially on what I have tested is just a couple of ebooks so when I saw that the results were acceptable even attribute 4 without regularization and integralization I started to increase the number of a box okay and include more models in the tests we can see here for example when we had a blue in some cases it resembles the Target in some cases it does not okay so this is very normal stable diffusion so we generate different images and the big the ones that we find to be better now when we try to change the color for example we can see here that the color has changed in some cases it does not in the case of address okay in some cases it generates extra things that we don't want so we can see that when it comes to regularization the changing color is better okay but in terms of similarity without regularization it looks slightly better okay when it comes to the genes when I try to change the color of the genes only the color of the shirt has changed this means that it's ideally we should test the genes in a different Flora okay and see which which results would actually produce these genes with different colors so it might be also related to prompting in general the results are really good we can see another example I increased I had it more ebooks this is ebook 5 for example we can see that even at Epoch 5 is still good now in koi SS in a box 5 the results were a mannequin however with the random seeds or different prompting the mannequin will disappear now in this case the mannequin has changed has disappeared now this was actually a mannequin okay see this is a mannequin now we have a real person or someone AI generates this real person now the address is really good we can see that we have our dress trained properly now with regularization we can see that the results are also acceptable good but not always as good as without regularization in terms of similar resemblance but in general it's really good and more flexible we might need to train more okay this is an example in which a mannequin appeared okay so a mannequin depending on the seed that is generated sometimes the mannequin appears this is also related to the prompting so when I saw this example appearing so I changed the color of the hair so I said for example brown hair and long hair so the mannequin started to disappear from the generated images now we can see that regularization at ebook 6 is better than ebook 5 even and it resembles the target more we can see here for example I prompted for a red I included red in the in the prompts or only the color of the shield changed you can see different types of of of shirts on the other hand with regularization as because with the realization it's more flexible but accuracy might not be always as good as we hope now this is a back shot from behind now my data set does not contain any person from behind except for the genes so this is something stable diffusion has diffused based on the learning patterns and has created a person with the back with this dress despite this stress does not exist this image does not does not exist okay so this is really advantageous and shows that our our model has learned the dress properly okay now similarly we can try different different models different shots now here I I tested a box 7 I saw that even with a box 7 the model is still learning so we might even test ebook 8 and 9 and 10 and still produces even better results than this ebook but I did not want to spend so much time training and testing but the principles remain the same now this is ebook 7 in a street for example we can see that this is our this address at ebook 7 is really good okay now if the genes remain the same that only the shot is being changed now with the regularization you can see that the patterns of the genes start to disappear in regularization without regularization the patterns remain so this means that with regularization we need more training actually so we might need to train up to Epoch term 10 or 15 Etc till we get something as we want it because we don't want to have half patterns we want to have the complete buttons and regarding to the address at ebook 7 so this means that this subject requires different number of steps on ebooks than that subject that is the point this is writing different loras is more appropriate than training different subjects inside to the same Laura okay in general the results were really good the Laura was able to identify the clothes and change the pattern and remove the mannequin and became very flexible now these results are not selective so I I just generated a couple of samples and show and show all the generated samples so I did not remove any defects or or selected these seats so the first couple of seats I I obtained the these are the results so the results are really great and the same principles apply to all Laura training of subjects have a good day
Info
Channel: How to
Views: 30,496
Rating: undefined out of 5
Keywords:
Id: wJX4bBtDr9Y
Channel Id: undefined
Length: 44min 10sec (2650 seconds)
Published: Thu Jul 27 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.