SDXL 1.0 vs SD 1.5 Character Training using LoRA Kohya ss for Stable diffusion Comparison and guide

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello this video will explain how to develop a realistic lower model of a real person using stable Fusion 1.5 and sdxl and compare the differences between the two now our subject is Catherine winick now the data set is like this one now I'll grab this data from Google so there are some inconsistencies in the data for example this image is slightly different from this one from this one but in general they are for the same subject some of the problems in the data is that the full body shots are not so clear for some of them however however in general some of them are also acceptable or good for the full body shots now these images are produced by the stable Fusion 1.5 you can see that the results are hyper realistic and they are very similar to the training subject okay we will see how using high resolution images can produce better results than lower resolution images we can see here for example the level of details is extremely high now in my training data set I only have one image with a smile with teeth so we can see that the teeth is learned pretty well for the subject now regarding the data set the data set must have a variety that means different clothing different backgrounds different what it was is I have one image with teeth I have 23 images but all the data is 1024 by 1024 or 600 and 640 by 1024 in either for in the image size now we can see here the results from a stable division 1.5 the portraits are perfect the whole body shots are also really good but when it comes through the full body shots we need to use after detailer for example to produce high quality faces because we don't have many data of a high quality for the full body shots now we can see here for example that the subject can change that have different types of clothing different types of clothing that do not exist in the data set all these clothes don't exist in a data set where I don't have yellow shells blue green Etc so I used a different clothing I've trained two models for sdxl one with 28 Network Dimension and one with 16 metric Dimension we can see that the results are also highly realistic they are very realistic the level of details especially the skin using electron dimension of 16 only we can see that using 16 Network dimension of 16 produces almost the same results as 128 so it's very recommended to reduce the network Dimension to reduce the file size okay we can see that the level of details is very high especially for the eyes the hair is okay these are different clothes different body shots for the same character this is using Photon stable Fusion 1.5 but also uses uh after detailer now according to stability the recommended image sizes that you can use for training or for production would be approximately 1024 by 1024 we can use lower resolution images as well but higher resolution would produce better results now if we have full body shots we can put them inside such resolution or use resolutions such as 768 by 1000 344 pixels now for my data set I will use lower resolution which is 640 by 1024 because I wanted to train the same data in sdxl and SD 1.5 now recording folder structure for the lower training as explained previous video you would have a folder for the output such as model a folder for the images and a folder for the class for the images we will create another folder and put our images inside it the naming convention would be the number of repeats such as 20 underscore the instance prompt space the class prompt which is woman I will use instance prompt XYZ to get value V1 for example in order to avoid to avoid having any names in the instance prompt to avoid any confusion with the SD checkpoint in case the name exists or similar names exist for the class folder we would put number of repeats one underscore woman the class now the contents of the class folder they should be high quality images generated by the same stable division checkpoint or realistic images for example it's better to have realistic images because these data will be trained along with the images so they will affect the training results the better the quality the regularization images the better the output will be okay so it's better to have high quality regularization images and of the same size as the training data now we choose the folder where the images are located then we remove all the features that we want to be part of our Laura for example when girls smile the color of the hair the color of the eyes Etc now these are features of the characters so we have to remove them from the captions okay we will also use a trigger word which is the same as the instance prompt now this trigger word will be located in each for the caption files especially to include trigger word when we are using vigorization without using regularization however trigger will become become less effective now after captioning is complete we can also double check the captions using borrow tag manager and make sure that all captions are correct and update our captions usually we should check each of the captions to make sure that captioning is correct and do some additions if necessary because the captioning the automatic captioning might not convey all the information that we need or may produce wrong captions in some cases now we go back to Korea as a graphical user interface select the source model as custom and choose an existing base model such as Photon for SD 1.5 a VR training is the XL base model for example which is that model then we turn on sdxl model checkbox to the right then we go to the folders to set up the folders now in the folders We need to select to select the image folder the possibly the log folder in the output folder and the class folder then we put the name of the model that we want after this point we go to the parameters and select a suitable set of parameters for the training now we choose a lower standard free training normal Laura or licorice such as licorice law cone to train licorice files now we may train licorice if we see that the law results were not satisfied Factory for example now for the licorice we would often use a lower Network rank for example 32 or less for stable Fusion 1.5 with network Alpha 04 or lower now with conclusion on rank of four or three for example with Alpha of one or lower value having lower Alpha will make the training smoother the effect of the lower weights becomes lighter now I will train a standard Laura so I choose standard we'll use batch size of one we can use higher if you have stronger GPU which is a suitable number of a box depending on the number of repeats and the number of images used I will use I usually use five to ten a books okay with number of repeats of 10 or 20. a mixed Precision bf16 for NVIDIA GPU I will keep the defaults for learning grade scheduler because they really work well now I choose maximum resolution of 1024 by 1024 depending on your data mine has Max of 1024 pixels will also enable Pockets because I have images of a different resolution now for the standard Laura usually we don't require more than 64 literature Network rank when we have 23 images for instance but since I have large resolution images of 1024 I will use higher Network rank of 128. the higher the network rank the more information it can preserve and learn however this can result in overfitting Faster audio is Alpha value of 32 or lower that usually produces final results than having a higher Alpha value now in in the advanced configuration we can turn a gradient checkpointing if we have lower vram such as mine because I'm training on 1024 by 1024 my GPU will not handle it so I return this option on and I'll be able to train a standard Laura file now in the sample images config it's very useful to have a sample prompt so that we can see the output of Koya okay it will allow us to stop early for example if we see that overfitting is happening or that a further training is not producing but results this will save us a lot of time and now we start the training we can see Pockets used all 640 by 124 the second bucket is 1024 by 1024 now number of Equalization it is used are six are 460 with the equalization function of two so the image is loaded are 460 plus 23 training images this gives us 683 images now during the training you can also edit the prompt that generates the image images by going to the model folder sample folder and editing the prompt file manually now we check the results from Koya SS okay now this will allow us to see an initial preview of the images and decide if we should continue training or not now we can see from the first ebook that the training results are good the second ebook is really good the third day book okay it seems that starting from the second yearbook the results are really very good and resemble the target so this is up to ebook 8 so I stopped the training we will do further comparison and stable diffusion but it seems that all evox from starting from the second ebook are producing good results oh it's models now because I'm training sdxl I need to turn this one on now recording the folders it's the same workspace my data exists in data folder for the image it's an image folder okay now subsequently there is the classification where I watch it in the class in the output folder would be in model well now we can start XYZ name the model output XYZ kwv1 sdxl for example okay the log folder we can create a log folder for that as well and call it log then we would set the parameters we choose standard Laura training patch one for example let's put it up to 10. uh it's unlikely I will use only eight okay we can set the mixed precisions and the remaining settings now I'll be using BF because we are using RTX because I will use the same settings we can also use add the factory example for lower vram the settings will be 1024 same as the standard lore that we used earlier okay now we could also use extra arguments if we are using artifact or other optimization settings now no half VI for sdxl uh there will not be a very much different from the first Laura in case someone or we needed is to merge it with another Laura now if we want to match lawyers we need to have the same network Dimension that's very very useful now I will use the same seed which is a seed one that are used as standard Laura this will allow comparisons to be much much easier we can use cache latents to desk if we want to produce different iterations or to try different settings now I will use adjust X warmers I want to use Guardian checking checkpointing because we have a large enough vram now regarding something here it's called The Noise offset I will use noise offset of this value okay now this value is the value that was trained using an sdxl so I will use the same value here so we can see that the streaming has started like I've said because obvi I'll use only 80 books in this module there will be 7360 total number of steps I might stop the training earlier now when I used 128 Network Dimension we see that the file size is very huge in comparison to stable division 1.5 this means that we need to actually reduce the dimension if we want to have smaller that file size if we check the resulting images we can see that if starting from ebook 1 the generation is good with high level of details the second Epoch okay it's starting to getting very strong the level of details is very high now in this setting I will reduce the network Dimension down to 16 for example or 32 so let's reduce this down to for example 16 okay and reduce a network Alpha of 4 for example and check the results how they will turn out to be then we will check the settings once again so this is Epoch seven okay I see that ebook this is ebook 6 and ebook 5. so I don't think that ebook 7 it's improving anymore so I will stop the training stop training now we will do the comparison now for the testing first we try the stable diffusion 1.5 for example photon we would assemble prompt we make create a simple prompt okay then we go to script XYZ and use session replace we start with the prompt here now this one I will replace it with the epochs from two till seven I'll not use eight okay the first ebook did not look so good so we will just start from two to seven and when we run the results we see that using only 768 by 768 despite that we trained on 124 we get perfect results these results are really good and they're very equal to the training data now once again if we're only using 124 by 124 1024 we see that the results are also great okay so if we do several tests we see that the standard door can actually produce amazing results with high level of details if you come here for example we can check the individual images and see that the images are really perfect they are very similar to the subject now we can also test comparisons between portraits Cowboy shots for body shots Etc and check the results but usually stable division 1.5 does not produce Greatful body shots because it requires higher resolution images but the use of the detailer can resolve this for example if we copy this command here and just put it in the after detailer it will become possible to produce high quality faces when the details are not so perfect the results are hyper realistic and they are very similar to the training subject okay we can see here for example the level of details is extremely high now in my training data set I only have one image with a smile with teeth so we can see that the teeth is learned pretty well for the subject now we can see here the results from a stable diffusion 1.5 the portraits are perfect the whole body shots are also really good but when it comes to the full body shots we need to use after detail or for example to produce high quality faces because we don't have many data of a high quality for the full body shots now we can see here for example that the subject can change that have different types of clothing different types of clothing that do not exist in the data set all these clothes don't exist in a data set where I don't have yellow shells blue green Etc so I used a different clothing we can see that the results are also hyper realistic they are very realistic the level of details especially the skin using literally dimension of 16 only we can see that using 16 Network dimension of 16 produces almost the same results as 128 so it's very recommended to reduce the network Dimension to reduce the file size okay we can see that the level of details is very high especially for the eyes the hair is okay these are different clothes different body shots for the same character so this is it and have a good day
Info
Channel: How to
Views: 21,360
Rating: undefined out of 5
Keywords:
Id: vA2v2IugK6w
Channel Id: undefined
Length: 17min 55sec (1075 seconds)
Published: Thu Aug 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.