LORA training EXPLAINED for beginners

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
to drain stuff you need a data set Toya SS and a good enough GPU as long as you have enough it will run automatic 1111 you should be able to train if you don't and still want to train stuff or want to train it extremely fast well we have a solution for you but more than that later first thing we will do is install Koya SS if you don't have it already go to the page and leave in the description the window requirements are here but if you have automatic 1111 you should meet them already maybe you want to install Visual Studio though to install create a Koya folder in a place you want it so I'm going to install it in the from the folder click on the path and type Powershell you could also type CMD if you want now on this page you can click this little icon on the top right and it will copy the code just paste it in the CMD and run it after a short time you will be prompted with 5 options just choose the install Koya SS UI by typing one then you need to select either Torch 1 or torch 2. if you can have Fighter 2 then use that one as it is faster I can't so I'll use part 2 one now wait some time for it to install after it's done if you have an Nvidia GPU RTX about 30 000 that supports kuna nnn then type 2. I yes I can't really check because I don't have one but that should install it and make your training way faster than mine I'll just close it for now before opening I double click the upgrade file to see if it updates anything it went by in a Flash so I'm just gonna open the ui.pat file which acts the same way as the automatic 1111 web UI just copy the IP into your browser like usual now you are good to follow the tutorial before we get started this is pretty much a collaboration with latov he has a lot of experience training Laura and thanks to him we have this tutorial no one knows everything though so if you think we are wrong about something please comment it down below and that's it let's go first of all Learners are little files that save information about something they've learned you can for example train a character names Karen of course if you just type that name into regular stable division it will just create something completely random using Alora it will under stand what it means and we'll be able to make it with any model you want like a mini dictionary that tells stable diffusion what a new unseen word and data means the best part about it is that they are very small in size so you can send it to a friend and they will be able to create the same character and you can dream pretty much whatever you can think of the only thing you will need is a data set and preferably captioning data set being images and captioning being the description of every single one of these images the images serve as information but in order for the AI to understand what the information means or what we want out of it the caption needs to explain it very clearly choosing the right images is key for getting good results the best possible option is to have a large quantity of good quality images but if you need to choose like we had to in the case of this data set you should choose quality meaning not only resolution wise but also having a clear subject that stands out and is easy to understand I will talk mainly about characters but the basics apply to everything you should try to find as many images as possible that keep a certain quality standard usually it is recommended to have about 20 but it is possible to train with less and better to train with more the more complex your subject is the more images you should use put them in different scenarios lighting poses showing different emotions and if you want them to be able to change outfits easily then different outfits also try to have close-ups General shots Etc even try different styles if that is important to you I will be training this character as an example you will see that most of my images are in square format it is not really necessary though in my case the images are 1024 by 1024 mixed with some that have a vertical aspect ratio you can train in lower resolutions and get good results I have them at high rest just in case at some point I want to train with that signs but 512x512 is enough if you see that I have cropped images it is because otherwise my data set would be too small you can grab a character in different parts to create a bigger data set even get the same pose duplicate it and flip it or rotate it and then use it as a new image if an image is very low quality try up scaling it a little bit if it doesn't lose the original character when in doubt you can use the extra step instead of the image to image notice that all the images have a pretty clear main subject that is easy to see you can edit an image to separate the subject from the background in a clear way create these images however you want generating with AI like our character taking photos drawing using 3D or the more standard way using online images but please if you do that ask for permission of the original creator for those images don't use images that are over detailed and hard to understand same with the images with super poor quality it is better not having them in than adding them just because you don't have enough images and don't put stuff in that isn't consistent with the subject you are training the more consistent your subject is across all your images the easier it will be to get good results okay in case of having a subject that is visually really different depending on the situation don't worry because that's why we have captions captioning tries to describe each image for the AI to understand its content it also helps us guide it on what the subject is what we want to change on it and how we're going to use the final Laura to caption correctly we will use keywords also known as trigger words to define the subject we want to train these are words that should be unique and don't already mean something don't name your character bench because that's already a thing maybe call it bmch the more unique your trigger word is the better and also try to keep them as short as you can this trigger word must be in every caption for all the images as long as your subject appears in them which should be pretty much always aside from having this work the captions should describe everything in the image except the subject so if I were to describe this image I would need two things a trigger word in my case I named the character Skinner and the description of everything in the image except the character so this image here would just be a white background you could add the more things though this will depend to what you consider to be an innate part of the character and what isn't for example if you think the pink hair isn't part of the character because sometimes it will be brown then you should caption pink hair or maybe the kimono isn't part of it because she will change clothes then describe that too in my case I consider the whole set of traits to be the character you know what let's try to see this from an ai's point of view so we get how to caption properly I will show you this image what do you see there is a lot of stuff right would it help if I said that there is a popping poking in here probably not even if I said there is one in all these other images too you still wouldn't know what a Poppin plugin is giving you more images probably helped you discard some options like now you know that it isn't a woman or a style but could it be a place let me describe what I'm seeing here in the first image I see a woman sitting on a bar and drinking a glass of juice there are four green bottles and popimpoken some windows and a bar table in the second image I see a bar building ink made out of wood a bar table next to a group of stools and a very large group of bottles next to apopkin lastly I see a product photography of popinpokkin over a table with a blurry background of a bar with some stools and blue lights and with this you probably know that with Poppin Pokemon I'm actually referring to this bottle of Wheeler Roar so if you were to paint a popping Pokemon in the future you wouldn't be drawing a bars tool you would be drawing this yellow bottle you also see an example of how a variation in images can help or hurt your understanding in the first image you could see the bottle even though it probably wasn't easy this might help give AI an understanding on where you would normally see this subject in a regular image instead the second image just confuses you because it is really hard to see the bottle even if it could be a realistic place for it to be it is not clear what it is and you should not use that image in the data set and finally in the last image here you can see a clear focus on the main subject without a clear description though it could still get lost as when I first presented the images you could guess I was referring to this juice bottle but you would have no idea if it really was our subject maybe Poppin plugin was referring to the stools in the background as they were also a part of every image this weight is very important to have a description of everything but our main subject and if your subject will change a lot you should describe the change with a new keyword or at least describe it let's say there were different types of Poppin plugins like a blue version of it you should use a keyword like B underscore popimpoken make sure it is just one word now a pimpoking is probably not a great keyword because it's very long I just like how it sounds if you didn't notice this style of captioning you will use will vary depending on what model you're training your images on if you're captioning for photography you will need to use the bleep style captioning which is basically a more fluid way to describe things like natural language for example a keyword standing in front of a crowd in a football field for anime or cartoon Styles we will use the Deep boru style of captioning it uses Stacks words separated by commas the same example with borrow tax would be keyword stunning crowd football field when describing the overall image try to use words or concepts that are already understood by AI don't use but instead Japanese festival if you use Auto caption mainly when training characters objects or concepts clean up your caption files very carefully this can make or break your training don't over describe stuff think of it like a prompt if you had too many stuff it will ignore most of it and just get confused let's use this scariest character data set and I'll show you how I captioned it the first thing I'm gonna do is renaming all the files to have the same name so I'll select all of them and change the name to skere make sure that you have no duplicate names this can happen if your image extensions are different so change every image to the same format either have all pngs all jpgs or all jpegs but try not to mix them let's get a little Head Start by Auto captioning these images I will use waifu diffusion captioning which is already integrated in Koya SS you go to utilities and you have w14 it is super easy to use you can just import your folder's path you could click generate captions from here but I will use these options first prefix is the first stack that will be placed in in every single text file we want this to be our keyword in my case scare and add a comma behind it postfix I don't care on undesired text we will prevent some tags from being generated I think the possible character traits AI will most likely bring a flower pink hair blue kimono blue Yukata Yukata kimono sandals blackband solo now you can click generate and we will see how every image gets its own text file with its corresponding name filled with tags describing it to clean this up you can do it by editing the files directly or you can use the application I'm going to use burun data set tag manager you can download it from the link in the description just download this ZIP file here and it should be installed it is a very good tool for anime but can also be used for realistic from here we will again open the datasets folders path and that will import all our images inside the program program with their matching descriptions of course on the left we can click the image we want to tack on the center we have the tags of the current image and on the right we have all the tags that we are using combining every single image the first thing you want to do is check these ones see if you find any tags that are referring to your character in my case there were a lot you can take them out by clicking the Red Cross icon right here I will also change the tag one girl for woman this could become an extra keyword as it is in every image then I'll go over each image and see if there are extra attacks that shouldn't be there or if I have to add new attacks that are missing I'm going to add words like motion blur and concrete to this here you can see all the possible tags and the number of users they have the more users the more likely it is that stable division understands them just do this for every image you can double click on one image to open a window preview and see things more clearly I'll let you see the tagging I made so you have a better idea even though my captioning might not be perfect a very important word to add is cropped you will use this word in images that don't have a head or are just the part of the body Without Really seeing the whole thing of course see the changes you make this should mean that your text files now have the same tags as in the program if you were talking for a realistic model your attacks might look a little bit like this as you can see the language is more natural there is one last thing you can use for training though and it is regularization images it is completely optional but can give your Laura a little more flexibility mainly when you have a small data set these images will act as a reference for AI do understand what type of subject you're trying to train you will need images that represent your character's class I think the best way to explain it is with examples so for our anime woman we would use woman for the skin character you could use either a king or men and if I were to train popimpokkin then I would use bottle try to create regularization images with variety same as what we would want in our data set close-ups images from a farm difference poses nighting scenarios Etc if you have different aspect ratios in your data set do the same for regularization images you can generate this image with AI by the way use as many as you want it's better to have 10 times our data set and now we have everything we need to finally start training open Koya SS and go to dreambooth Laura and here we start deciding some important stuff about our training process you can import the configuration style if you have a pre-made template but since we are learning how to use this you probably don't yet in the first page we will choose the model we are going to drain the Laura on big models that have a similar style to your data set and go to drain realistic choose a realistic model and to train anime well an anime model when in doubt on what model to use then use the basic stable division 1.5 model it is a very versatile model that can give really good results with pretty much anything for the anime training I use the model recommended to me by the Wise One any Lorna created by lycon to import a custom model you will need to download it I'll go to my stable division web UI models and stable diffusion folder and here I have the one I'm going to use in this case next is the folders tab we will need a particular folder structure after clicking on rainbows Learner in the tools tab you will see Dreamboat Laura folders preparation and now we have some options in training images we will add data sets folders path and if you have regularization images then add the folders back to that as well up here you will see instance prompt and class prompt this is only really useful if you're going to train the model without capturing or without regularization images which is possible by not specifying the expansions of our captioning files later if you do input a path for regularization images you will train AI with what scary means while at the same time you will be training AI to use the images of woman you have and make them cosplay as scary as you progress in training you will see that the word woman will slowly turn into scary when generating an image creating a mix between your character and your regularization images which will make your character more versatile but will also keep some feel of the regularization images in my case I'll use scary as the instance prompt and woman as the class using our previously made regularization images this number we can keep at one to talk about repeats I need to talk about steps and epochs repeats represent the times AI will cycle through our data set using one step pretty much every cycle once all repeats have concluded AI will have trained one Epoch we will go more in depth on it on our Advanced tutorial but for this one let's just put it like this forget about repeats and think of them as steps per image so if I used 40 repeats AI would be spending 40 steps on each of the 16 images we have on our data set and that for every Epoch you train okay analogy time let's say you have a History exam and your history book has 16 chapters 16 will represent the amount of images we have steps would represent the amount of time you're dedicating to read and understand each chapter but most of the time information will not just get stuck in your head on first try so the next day you come back for another study session those study sessions are epochs the second study session will build upon what you learned in the first and the third upon the second and so forth steps will mainly vary depending on the complexity of the subject you're training as well as how many images you have and don't think the classic nah last chapters means I have to study less time bam you're repeating next year the fact that we have less chapters images to learn an entire topic means that we have to study them really really hard so the less images you have the more steps you will need on each image of course everything depends on subject though don't be afraid to experiment you can ask the fork one Discord on advice for a starting point for now if you want a one feet all type of rule the usual number of steps just to train a Laura in the base stable division 1.5 model are about between 1500 and 3000 so calculate using how many images you have and how many epochs you want to train for beginners I recommend a scatter shot approach where you will drain a lot of epoch's first try and just leave it at that you can train from 5 to 10 ebooks keeping in mind that it is an approximate number and that only applies to the stable diffusion 1.5 base model for any Lora for example it is a little lower for this beginner guide I'll be using 14 steps and training 6 epochs that makes a total of 16 images multiplied by 14 steps each and then by 6 epochs and yes training with polarization images will double the amount of steps again but it also slows the training by about half so we won't calculate it for now all input at them choose where you want to create the folders and first we will hit prepare training data this will make the folders necessary to train the Laura with all the namings and stuff already done and then click on copy info to folders path which will paste the info into the folders tab of course for whatever reason it missed something you can input your own path by clicking these icons or with regular copy paste for model output name you can put whatever you want but I recommend adding some training information so you can remember what version this model was and as a training comment I usually leave it blank but you could add the keywords for example so you don't forget and now onto this spooky scary stuff training parameters the first thing you will most likely notice is the Lora type here you can choose to drain either regular Laura or a like or his version these are more advanced loras that I will not go into this video main reason being because they are way too advanced for me something else you can recognize here is the epoch number for scary I'm going to train 6 epochs and I'm Gonna Leave save every an epochs at one this will save Alora on every Epoch if you put it at 2 for example it will only save three total Lawrence if you want to use captions input your text files extension in my case it is dot txt but science is how many images it will calculate at once and what you input here depends mainly on what your GPU can handle in my case I will use 4 and this makes the training time almost 4 times faster putting it lower can get slightly better results but it usually isn't worth the time if you don't have a super GPU mixed Precision is some weird stuff that I have no idea how to explain and probably don't understand myself either so just know that the bf16 is the most advanced version and requires a newer graphics card icon the USB F16 so I will use fp16 both are perfectly okay to train on these on the right you can leave as it is and for the seat you can do what you want it doesn't matter for training what does matter is the next section though learning rate scheduler and Optimizer in this video I'll just explain what they are on a general note the learning rate is basically speaking how much time it takes for AI to learn the subject but it is not as simple as okay I'll put it extra high then because that has its risk A Higher Learning rate is like forcing AI to be hyper focused and stressed out about the exam like how you studied last day for the exam after playing League of Legends for the whole month you had yes you will probably learn the full 16 chapters next day you might remember the overall history of what you studied but you will probably mix up some details maybe you don't remember some names or years this could lead to trainings that look like your character but that lack some of the distinctive traits they have for example the flower on the other hand having a slow learning rate can not only lead to not learning the exam on time but even if you do in the end you focused so much on the details that you just know that and maybe it didn't fixate on the whole context of why stuff was happening I don't know if this is what happens IRL because I never studied like this but it works for analysis sake this could lead to AI over separating the character for example making you prompt for a blue kimono we will use a load the noising rate for characters or things that need a lot of focus on details even though we will need more steps for the training High Learning raids are nice to drain models in less time and less steps as well as subjects that are more shape oriented and don't have too many details LR scheduler this is the way AI learning rate will vary over time we'll go more in depth in the advanced video but recommended are using constant or cozine I will use constant for demonstration purposes as it is the more predictable one if you feel like your model over trains really fast then use goes in and for the optimizer just use aw there's not too much to say here the other ones are usable but nmw 8-bit consumes less B Ram so I would recommend this one unless you're like me and you can use it because it crashes for now don't worry about these two learning rates just use the unit learning rate as base and divide it by half to calculate your text encoder learning rate the default values are usually good enough world map you can live at 10 percent I will put it at 5 but some people don't even use it for Network Rank and Alpha you can just follow the rule of thumb Network rank you can boot at 128 which is what most people use even the one put it at 64. a lower Alpha value will lead to more creativity on the model and therefore more flexibility it is also less likely to over train but it can be less consistent as well as take more time to train a higher value will give you more consistency but it is also more likely to overfit and it will keep the style way more so if you are interested in changing Styles probably use a lower value usually good values are 8 16 32 you can go higher or lower depending on your training a very important part is this resolution and bucketing the normal values for this are 512x512 or 768 by 768 so pick one of those two unless you have a monster PC and don't care about anything in life 512 will make the training way faster it will also consume less b-ram I would actually recommend this one if these are your first loras aim will also learn with less steps which could make it over train quicker but it is really good if you're just practicing and testing 768 or higher will of course take way more time on the vram as it needs to drain on higher res images it will lead to higher quality lures and make AI less likely to over train images in the data set that were larger than the picked up resolution will be correctly downscaled to match it as for images that are not Square you will have to enable bucketing bucketing will allow you to input different resolutions if you want the full content of the image if you don't enable it your images with different resolutions will be cropped at the center or at random if you activate an option called random cropping which can hold your training if the autograph takes the important part of the image out like caption parts that are no longer there and might end up confusing AI with this you're pretty much set to go but looking into some Advanced options will help there are a few things that we may change the first one for me will be clip skip if you're training realistic you should put clip skip at one and for anime models put it at 2. as I'm training an anime character 2 will be the way to go for the token length it will depend on your caption but it is very unlikely that you pass 75 tokens and probably shouldn't anyway another recommended checkbox to activate is Shuffle captions remember that some of those tags are our keywords though so we will need to use keep tokens to maintain those at the beginning of the prom if you use more than one keyword then match that with the numbers of tokens capped xformers will make your Generations way faster as well as save a lot of b-run will also allow you to use more batches so I'd always activate that with this you're ready to go and can't hit train you will see up here that the training is starting if nothing crashes it should take a little while depending on your PC as you can see to train these many steps will take me more than 3 hours luckily I actually trained it in 5 minutes using today's sponsor dreamlook.ai the absolute perfect solution for people that want to train models but don't want to spend thousands on a new pc we will use expert mode as you can see the parameters are still really simple this is because instead of training Alora directly it trains a whole model and then extracts Zalora from it this is actually the best way to drain stuff but it usually takes super long amounts of time that common models like us can't afford not a problem for these people though they have a method so optimized it drains things at light speed for the captions we will need a Json file we talk to them about making it easier for the casuals and they are working on it in the meantime you can use the python script I left in the description just run it select the folder with your data set and then select where you want to save the Json file and that's it now you have it we upload it and play with the parameters this is mainly for businesses so I'll ignore it for now we want to extract Allura from it of course we will use face scrubbing and for the base model I'll use anything B3 as it is an anime model change this for our keyword and the learning rate I live as it is we will double the steps though to ensure the training is fully done our goal is to end up with a nicely trained model clicking run will consume 10 tokens they give 25 for free so you can test it out down here you will see the training in just 5 minutes it finished the full thing while my Lora still has such a long way to go now I'll download the Lora and model and there you go let's test the results since I'm draining and don't want my PC to explode I will use dream looks AI generation service I will create 8 images with the model which is strength and this prompt as you can see we have a pretty well trained model training the Laura later on local it was a little undertrained which is completely normal using 1.3 strength we get the desired results the first 30 of you that use the code not for talent will get a 20 discount on the first purchase there are also Google collabs to train loras but I haven't been able to complete a single one without paying and they are such a pain to use you can completely try them though now back to my local training three hours later when it's finished you will see here that it is a hundred percent and no there is no congratulations message last step will be to test the models to see if it's working and which one is the best for this I'll go into the models folder and copy all the runners that we just trained into our stable division web UI models Laura's folder I made a custom one for this character next in stable diffusion we will look for the first one of them all scary multiple zeros and one I prompt for something that allows us to see the whole character using the keyword of course and then we will use the XYZ script to swap between the epochs we save so in the x-axis I use the script search and replace option and put the name of the Lora like this then copy and paste it multiple times separating it with a comma and replacing the last number with be next so one two three four five When You Reach five you know that the next Lura is actually the one that has no number once you have this you can generate but I like to test something else at the same time I'll see what happens if we don't use the word woman for this I'll change the search and replace for the y or Z axis too you can test as many stuff as you want to after the first image has been generated you will see how in the first ebooks the character is not really well understood and the more we advanced in training the better AI is at replicating it even though sometimes it will start mimicking our data set or replicating things that we don't want I probably like Epoch 5 the most if it looks nothing like your character or you're not satisfied with the results comment it in our Discord server so we can help of course you can also learn more about Laura's Yourself by watching this video right here where we go deep on how to make you all grow a training Pro thanks again to the food and see
Info
Channel: Not4Talent
Views: 75,930
Rating: undefined out of 5
Keywords: stable diffusion, sd, stable difusion, promting, prompt, tutorial, guide, stable diffusion prompt guide, stable diffusion tutorial, ai art, ai, beautifull ai art, create good prompts, talk to stable diffusion, kohyass, kohya, LORA, training, lora training, dreamlookai, dreamlook, dreamlook.ai, characters, consistent, dreembooth, json, understand, learning rate, scheduler, constant, cosine, warm up, optimizer, network rank, network alfa, dim, bucketing, buckets, 512, 768, not4talent, notatalent, notfortalent
Id: xXNr9mrdV7s
Channel Id: undefined
Length: 27min 33sec (1653 seconds)
Published: Sun Jul 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.