Greetings everyone. Today I am going to show you how to install
Automatic1111 Web UI for Stable Diffusion X-Large aka SDXL. How to enable quick VAE selection and use
correct SDXL VAE. How to use your trained SDXL LoRAs with Automatic1111
Web UI. Not only SDXL but same rule applies to SD
1.5 models and SD 2.1 models as well. How to assign previews to your LoRA checkpoints. How to do Grid x/y/z checkpoint comparison
test to find your best LoRA checkpoint. How to sort your generated images with best
resemblance according to a reference image. How to update your existing Kohya GUI to the
latest version or install it from scratch with the correct installation settings. How to prepare your training dataset images
properly for your Kohya training. How to do SDXL LoRA training on Kohya with
most optimal VRAM settings for lower VRAM having GPUs. I will also show the best settings if you
have a strong GPU such as RTX 3090. Why do I use Ground Truth classification / regularization
images and its explanation. Detailed information regarding repeating logic
of Kohya which is a very commonly asked question to me. Very detailed explanation of Kohya settings
and parameters for training. How to manually execute training command from
command line interface for Kohya GUI. How to save state of your LoRA training and
continue from saved state. And how to get such good images as you are
seeing right now with LoRA inpainting. I have prepared a very detailed GitHub readme
file. All of the instructions and links that you
are going to need will be in this file. When watching this tutorial, follow the instructions
written here. Because when you are watching this tutorial,
some of the things that I show could be outdated because of the speed of the AI development. However, I will update this GitHub readme
file. To be able to follow this tutorial, you need
2 things to be installed on your computer. First 1 is Python. If you don't know how to install Python, I
have this excellent tutorial, watch it. For Automatic1111 web UI, Python 3.10.6 is
suggested. However, I suggest you to install Python 3.10.11. Because latest xFormers, which is our mainly
used optimizer library, is now requiring 3.10.11. And 3.10.11 is working very well. Do not use Python 3.11. It won't work. It has to be 3.10.X version. And you also need Git. After you installed Python and Git, you are
ready to follow this tutorial. So I will show you how to download, install,
and use Automatic1111 SD web UI with SDXL. I have already automatic installer. Actually, installing it is very easy, but
I am keeping up to date my Patreon post as well. So let's do both automatic and manual installation. When you go to the Patreon post, you will
see auto_1111_sdxl.bat file and download_sdxl.py file. Download both of them. Move the files into any folder where you want
to install your Automatic1111 web UI. This installation will not break or modify
your existing another installation. Double click auto_1111_sdxl.bat file, click
more info, click run anyway, and it will clone the Automatic1111 web UI repository, install
it with xFormers, and also it will download SDXL model files for you. Okay. Let's also do the manual installation. So for manual installation, we will begin
with cloning Automatic1111 web UI. Copy this, make any new folder, start a new
CMD inside that folder like this. Begin with cloning. So the Automatic1111 web UI has been cloned. Let me also show you my Python version. Start a new CMD, type Python, and it will
display you the Python version as you are seeing right now. This is important. You need to see your Python version like I
am seeing 3.10.11. Okay. After repo has been cloned, enter inside the
repo, right click and edit webui-user.bat file. If you are not seeing the extension, go to
view and in here, check file name extensions. Then you will see the extension, right click. You can edit with notepad, Notepad++, anything. I am preferring Notepad++. In the command line arguments, we will define
--xFormers and --no-half-vae. That's it. Save it. Then double click and it will start installing
the Automatic1111 web UI from scratch for you. If you are wondering why I did add xFormers
and --no-half-vae, these are command line arguments. And when you click this link, you will see
all of the command line arguments of Automatic1111 web UI here. SDXL requires --no-half-vae because the default
VAE of SDXL is 32 bits. Therefore we are adding it. xFormers is an optimization library. It will speed up our generation and also it
will speed our training as well. As you are seeing right now, automatic installer
is also downloading the model files automatically for us. So you need to download the SDXL model files
if you haven't yet. I have put the direct download links here. This is the corrected VAE file. Open this link and it will start downloading
automatically for you as you are seeing right now. Then we need 1.0 SDXL model. I prefer this for LoRA training. So open this link. These are direct links and it will start downloading
that file as well as you are seeing right now. Once the files have been downloaded, you need
to put them into the correct folders. Put the sd_xl_base_1.0.safetensors file into
the models folder, cut it, move back into your installation folder, enter inside models,
Stable Diffusion. This is the folder where you need to put and
paste it there. Then return back to downloads, cut the sdxl_vae.safetensors
file, move back into fresh installation wherever you are installing. Move inside models, inside VAE folder and
put it there. So these are the folders where you need to
put your files. When you see that running on local URL like
this, that means the installation has been completed. The rest is same for both automatic installer
and manual installation. So we need to open this URL, copy it, open
it in a new window and you will see SDXL base model is selected. Are we done yet? No, we are not done yet. We need to set the VAE that it is going to
use. If you are wondering what is VAE, with Stable
Diffusion, we are working in the latent space and latent space resolution is 1 over 8 of
the generated image. So for SD 1.5 version, it is 64 by 64. For SDXL, it is 128 by 128, but we are generating
1024 pixel images. Therefore, VAE is used to reconstruct latent
image into the final output image. So how are we going to set our VAE? Go to settings, in here go to the user interface,
in here go to the info quick setting list. When you click here, it will list you all
of the options. Type VAE and select SDVAE, then apply settings
and reload UI. After you have reloaded your UI, you will
see SD VAE Automatic. So what does Automatic means? Automatic means that it will look for the
VAE same name as this safetensors file. When you select none, it will use the embedded
VAE file. Then you can select the certain VAE from here
and we will use this VAE. Why? As I did put the link here, open it. I made a comparison with the SDXL 1.0 embedded
VAE and the released VAE, which is the one we downloaded. You will see the difference. With the embedded VAE it generates noise like
this. You see those green pixels in the image. When you look at the downloaded VAE, you won't
see them. So the embedded VAE has problems. That is why Stability AI released this fixed
VAE file. Now we are fully ready to start using SDXL
with Automatic1111 web UI. There is also refiner of SDXL, but it is not
officially supported with Automatic1111 web UI yet. It is in the development. I am following the news and I will make a
new video, hopefully when it is merged into main branch. So you can type anything. Photo of a speed car. The SDXL works with 1024 and 1024 pixel resolution
don't forget that and you are ready. Just hit generate and this is our generated
image. Now we can begin installing Kohya SS for our
training. So to be able to use Kohya SS LoRA training
script, you need to first have installed Visual Studio Redistributable package. The link is here. Open it and install it. After that, we will begin with cloning the
Kohya SS. This is its repository. Let me open it. So this is the repository that we are going
to clone and run. It is very simple to install. Copy this, make a new folder and enter inside
it. Open a new CMD, paste the link. It will clone the Kohya SS repository into
that folder. You will get a folder like this. Open inside the folder, find setup.bat file,
double click it. It will ask you a bunch of questions. It is expecting Python 3.10.9. However, Python 3.10.11 is also working. We are going to select option 1, Kohya SS
GUI installation. You don't need to install cuDNN files anymore
because we are going to install Torch version 2 and it comes with the proper cuDNN files. Then select option 2 here, Torch version 2. This is really important. Do not select Torch version 1 anymore. This screen may take a while. When you open your task manager, go to the
performance and when you select your ethernet, you will see the download like this. Because currently it is downloading and then
installing the downloaded libraries. You won't see any messages until it completes
the current package and moving to the next package. So this is all you need to do for installation. Let's say you have existing Kohya SS installed. How can you update it? Open a new CMD inside that folder, then type
git pull. It will fetch the whatever new, then run the
gui.bat file and then it will update the necessary libraries if there are any and you will have
most up to date version of Kohya. This is really important because it is getting
updated frequently because of the SDXL model. So make sure that you have the latest version. Recently, one of my Patreon supporters were
getting out of VRAM error and it turns out that his Kohya installation was not up to
date. After we updated it to latest version, the
out of VRAM error is gone. If you ever feel like your installation is
stopped, touch your CMD and hit enter. Because sometimes when you touch your CMD
window, it will freeze. So for unfreezing it, touch your CMD and hit
enter and you will make the CMD progress continue. After a while, when all of the dependencies
are installed, you will be asked these options. Select this machine, hit enter. Select no distributed training, hit enter. Do you want to run your training on CPU only? No, hit enter. Do you wish to optimize your script with Torch
Dynamo? Hit no, hit enter. Type no, hit enter. Do you want to use Deep Speed? Type no, hit enter. What GPUs you want to use? Type all, hit enter. Now this is important. If you have GTX 1000 series, use FP16. If you have some special kind of cards, look
whether they are supporting BF16 or not. If you have RTX 2000 series, 3000 series or
4000 series, select BF16. So I am going to select BF16 since I have
RTX 3060 and RTX 3090. So BF16 selected and we are done. So this was all of the installation. What if, if you want to install on RunPod
or on Unix? I have auto installer here for RunPod. It should also work with Unix systems. You probably only need to change the workspace
folder path. This is the video where I have shown how to
install on RunPod. Its readme file is up to date. Hopefully, I will make a new video for Kohya
installation on RunPod for SDXL LoRA training. So after training has been completed, you
can close this CMD or start the Kohya SS GUI in browser. I will show you how to start. So let's close this. Enter inside where you have installed the
Kohya. Double click gui.bat file and it will start
the gui automatically. So let's say you have updated your Kohya installation. On this screen, you will see whether any library
is updated, upgraded or not. So this is my current installation. Let's open this link. And this is our Kohya GUI LoRA training or
DreamBooth training screen. It also supports Textual Inversion. Hopefully, I will also work on DreamBooth
training for SDXL and make amazing video for that as well. So for today's topic, we are going to do training
with LoRA. LoRA is optimized version of DreamBooth training. If you are wondering what is LoRA training,
I have this master tutorial, which is about 1 hour. Watch this tutorial. Also, if you are wondering rare tokens, class
images, and all about DreamBooth training, this also includes LoRA training somewhat. Watch this amazing tutorial. You won't regret it. So this is our GUI of Kohya LoRA. First of all, we will begin with selecting
our source model. You see there are quick selection of models. However, we will use custom model. So select custom from here. Then it will ask you pre-trained model name
or path. So click this icon. If you are on RunPod, you would be need to
give full path. But here we can just click this. Where we have downloaded our model, it was
inside our Automatic1111 web UI. So I will enter there, models, Stable Diffusion
and SDXL base 1.0 safetensors file. So this is the file of my base model where
I will do my LoRA training. Don't forget to check this checkbox SDXL model. Then after doing this, make sure that you
have saved your configuration. I also shared my configuration in the GitHub. When you go to this section, you will find
my used settings in this video. After downloading it, you can change the extension
to JSON. Let me show you that. So this is the file. I will change the extension to JSON. Click Yes. Now I can load it. So let's first load it open into downloads. This one and it will load all of the settings
I have. However, if you do this, you also need to
change your image folder path, regularization folder path, output folder path, logging folder
path, and the model output name if you wish. So let's start from zero again. OK, so I have selected my base model, selected
SDXL. I will save my configuration. So for saving, click Save as. Where I want to save my configuration? Let's save it inside this folder. You can save it wherever you want. I will save it as test1 and click Save. And it will generate this file and save whatever
the configuration you make. After each change don't forget to click Save
button for saving. And when you do that, you will get the message
on the CMD window where your Kohya GUI instance is started. Whatever we do, we will see it here. So we will begin with preparation of the training
images. This is really important. But before that, I noticed that I am using
the DreamBooth tab. Don't make this mistake. We are doing LoRA training. So select LoRA tab and let's make the same
changes quickly. So let's select the model, Stable Diffusion,
SDXL base, SDXL model. Let's save as so the saving file here, save
OK. Make sure that you are in LoRA tab. And now we can begin with the preparation
of training data set. Click tools. In here you will see deprecated. I don't know why they are deprecating it,
but this is the best way of preparing your training images. Otherwise you will likely to make mistakes. So the instance prompt will be OHWX. However, there is also debate that use a celebrity
that is like yourself if you are training yourself. I will hopefully make a video about that and
compare the celebrity name approach versus rare token approach. But for this video, I will use OHWX. The class prompt will be man. As I said, if you don't know what are these
rare tokens, class token, class images, watch this excellent tutorial. Okay. Training images. This is really important. Currently, I am using these images and these
images are not good, not optimal. Why? Because they have repeating clothing and repeating
background, which makes model to memorize those as well. Hopefully I will make an amazing how to prepare
good training data set tutorial very soon, but for now we will use this. As I said, this is not a very good training
images data set. All of the images resolution are same, which
are, let me show you right click sort by more than select width and height from here. Then view details. Okay. All of the images are 1024 by 1024. Kohya scripts are supporting bucketing as
well, which means training with different resolution images. But I suggest you to use same dimension having
images for both training images and for classification images, because in one of my viewer, I seen
that bucketing were causing error and I had to connect his computer and try to find out
the reason. And the reason was he had so many different
resolutions and bucketing were failing. So these are my training images. Copy the path. You can select the path from here, control
C copy and paste the path here. Alternatively, by clicking here, you can select
that path. For regularization images. This is now really debated topic, but I will
show you a proof of my approach right now. I am using ground truth regularization classification
images approach. So what is that? When you click this link, you will get to
this page. This page is EveryDream2trainer. Every dream trainer is also a trainer script. Hopefully I will also make a training with
that and compare the results with Kohya and the DreamBooth extension of Automatic1111
web UI. So in here it is very well explained. The ground truth means that the real images
originally were used to train the Stable Diffusion. So what are they in this case? They are real images of people like you are
seeing right now. These are my classification images. I spent huge time to prepare them. Those images are shared in this Patreon post. You can also prepare your ground truth regularization
/ classification images. Using very high quality real images is the
key of ground truth images. I have shared so many resolutions of these
images. They are collected from Unsplash .com so they
are freely available to use even for commercial purposes. All of these ratio having images are shared
on my Patreon and from this link you can download them. So how did I generate these many ratio having
perfect quality images? When you open these images, you will see them
they are in the perfect ratio with the maximum possible resolution. I have explained the scripts that I used in
this tutorial video. Watch this tutorial video to learn more about
my subject cropping script. I shared the script on the video for free. You can also download it from my Patreon. The script is included in this post as well
as with the previously prepared images for you. They are also not just auto-cropped. They are also preprocessed. What do I mean by preprocessing? When you watch this excellent tutorial video,
you will see the preprocessing scripts. With preprocessing I have eliminated outlier
images from the dataset, which is minimal when you use unsplash. But still, there are some images that needs
to be eliminated. For example, if there are 2 person in 1 image,
I eliminate that image. I deleted or moved it from the folder. So we don't have any classification / regularization
images that have 2 person in them. I also then manually verified all of the images,
then shared them with you. So it was a huge task for me. Therefore, if you support me on Patreon, I
would appreciate that very much. Still, you can find all of the scripts in
these tutorial videos. They are YouTube videos, open them, and then
you will be able to replicate what I have done. Since we are doing training with 1024 by 1024
pixel, I will use this dataset, this resolution. Let me show you their resolution, right click,
details, and you see all of the classification images are same resolution. If you don't use same resolution classification
images, especially in DreamBooth extension of Automatic1111 web UI, you will get error. It will try to generate new images. I don't know what is the behavior of Kohya
scripts. So make sure that your classification images
are same dimension as your training images for best results. So let's copy this path. Let's go over to Kohya GUI, paste it. Now number of repeating, which is one of the
most confused and debated topic of the Kohya. I have asked Kohya repeating logic to the
original developer of Kohya. Click this link. You will see that I have asked four cases
and thankfully the Kohya SS replied to my questions. Moreover, we are discussing how can we improve
the repeating logic. You can also leave a comment here. I hope you do. So I am hoping that Kohya will improve his
repeating logic. So read this thread and you will understand
the repeating logic. I will use repeating 25. So in one epoch, it will train each image
25 times with 25 different regularization images. How many regularization images you would need,
you can easily calculate. It is training images multiplied with the
number of repeating times. So I need total 325 classification images. If I had 20 training images, I would need
20 multiplied with 25, 500 classification images. This is the way how you calculate your number
of classification images that you need. So the destination training directory, this
is where generated LoRA model checkpoints and other things will be saved. If you want to use them directly with Automatic1111
web UI, you can do this. Click this folder icon, enter inside wherever
you have installed your Automatic1111 web UI, enter inside models, enter inside LoRA
and save it there. So our generated checkpoints will be saved
inside LoRA folder. Therefore, we will be able to use generated
LoRA checkpoints immediately and very easily. After doing this, click, prepare training
data. Once you clicked, you will see the messages
here. And when you enter inside that folder, inside
LoRA, you will see all of the folders are generated like this. Make sure that you get this done message because
it will copy all of the images inside that folder. You see all of the class images are copied. Then all of the training images are also copied. After that, don't forget to click copy info
to folders tab. Then you are ready. You have prepared the training images directories
like this. And model output name. This is important. Whatever the model name you give, the files
will be saved like that. So let's say 12vram. This will be my model output name and then
click save. Don't forget to save your changes. Are we ready? No. Now we need to set training parameters and
this is where LoRA training becomes very complex. Because there are so many different LoRA types. There are so many different optimizers. Each one have different effects, different
settings and finding optimal settings is very hard. I have done over 20 trainings and I will share
my best training values with you. So train batch size. Make this 1. Why? Because in machine learning, if you make the
training batch size higher, then you will reduce your training generalization. Moreover, increased training batch size requires
different learning rate. So if you increase your training batch size,
you need to test increased learning rates as well. The learning rate will not be same. Number of epochs. I train persons with up to 200 epochs. Since my repeating is 25, I will train 8 epochs. Because in 1 epoch, it will train 25 times
each training image, which is actually 25 epochs. So the total number of epochs will be 200
with this way. I will save every 1 epoch. So actually I will be saving every 25 epochs. Caption extension. If you use captions, you need to define their
extension. However, for training a person, I don't find
captions improving the quality. If you were doing fine tuning like 200 subjects. Like with 10,000 images, captions would be
necessary. But for teaching a person, I don't find them
useful. Hopefully, I will make a fine tuning video
and explain the logic to you as well. So how you can improve your training quality
without using captions? You can improve your training data set and
it will improve your training quality. Make sure that your image is very best quality. Don't have repeating patterns such as clothing,
such as backgrounds and your training will become much better. Okay, select cache latents and cache latents
to disk. These will improve your VRAM. Mixed precision will be BF16 since my graphic
card is RTX series. Save precision will be BF16. If BF16 don't work your graphic card, then
you need to select FP16. You may be wondering what are these? This is totally related to machine learning
training. Normally, each number has certain precision
which is by default FP32. However, it will use much more VRAM and when
you use mixed precision, half precision, the quality is almost same. So for saving VRAM, we are using mixed precision. So BF16, BF16 selected. Number of CPU threads per core. I increased this but I didn't find much more
improvement. Actually, there is an amazing file that you
can read. Find explaining LoRA learning settings and
open this link. This is a huge file which explains majority
of the settings that you will see in Kohya training. This file is a gold. Believe me. Read this file and you will learn so many
things about how LoRA works, how Stable Diffusion works, what are the parameters, options of
training. Believe me, you won't regret it. If you are interested in learning what are
these parameters, read this file. Okay, seed is optional. You don't need. Now learning rate. This is really important. The learning rate depends on your selected
LoRA type and your optimizer. I am using Adafactor. This optimizer is supposed to be lowest VRAM
using optimizer. How do I know? Type Adafactor VRAM usage to Google. You will get efficient training on a single
GPU thread on Hugging Face. And in here search for Adafactor and you will
get here. It uses a very low amount of VRAM and for
getting over the instability issues, we are going to use these extra arguments: scale
parameter false, relative step false, and warm up init false and it becomes stable and
it uses minimal amount of VRAM. You see AdamW uses 24GB GPU, Adafactor uses
12GB GPU and 8-bit quantization uses 6GB GPU. However, 8-bit quantization of DreamBooth
training is not supported yet as far as I know. This is available currently for large language
models training. So we pick Adafactor. We make the learning rate 0.0004 which is
equal to actually 4e-4. So 4e-4 is scientific notation. However, we type it like this: 0.0004. Learning rate scheduler I will use constant. As I said, read this file to learn more information
about all of these options. So constant. When you use constant learning rate scheduler,
this is 0. Okay, Adafactor selected. Don't touch these two text boxes. Optimizer extra arguments. This is really important. In our readme file, copy these optimizer extra
arguments, paste them here like this. Max resolution. This is also really important for SDXL. We are going to use 1024 and 1024. We can possibly use higher resolution. I will explore that. But for 12GB training, you couldn't get higher
resolution. You should use 1024 and 1024. Stop text encoder 0. Don't enable buckets. In some cases, when people enabled bucketing,
they saw training speed decrease. I think bucketing system has some flaws, some
bugs. Therefore, don't use it. And since we are doing training with the same
resolution, we don't even need it. Text encoder learning rate. I use same learning rate for both text encoder
and for both UNET. As you are seeing right now, I made them equal. You can select full BF16 training. This is supposed to reduce VRAM usage, but
it won't work with all of the optimizers. So it is fine to select it. Now Network Rank (Dimension). This is one of the most crucial part. As you increase this, it will be able to learn
more details, more information. However, it will also increase your VRAM usage
and it will take more hard drive space. Each one of the checkpoint file will become
bigger. If you have RTX 3090, 4090, a graphic card
that has 24GB, I suggest you to use Network Rank (Dimension) 256. When you use this dimension, each one of the
generated files will be 1.7GB. However, for 12GB having GPUs, you will get
out of VRAM error. I suggest you to start with 32 and you should
be able to train with 32. If you are not able to train with 32, that
means that your other applications are using your VRAM. How can you reduce them? Reduce your screen resolution. Right click, open your task manager, go to
startup and disable all of the startup applications. So do whatever you can do to reduce VRAM usage. You shouldn't have more than 500MB VRAM usage
when Kohya is not running. With that way you will be able to use Network
Rank (Dimension) 32 even 64. I will do the testing on my second GPU which
is RTX 3060 and it has 0 VRAM usage because it is my second GPU and I am able to do training
with up to 96 network rank. However, I tested this on one of my Patreon
subscribers computer and he was able to train with 32 network dimension on Windows 11. So you should be also able to do training
with network dimension 32. This is generating decent results. This is also the network rank dimension I
used to generate those images that I have shown you in the beginning of the video. Network Alpha. This is not very well explained anywhere,
but this is actually affecting the learning rate. When you are doing training you are actually
changing the weights of the model. So network Alpha changes how strongly the
learning rate is applied. For standard LoRA you should leave the network
Alpha value 1. But if you use other LoRA types then it changes. Okay all these are default advanced configuration. In here all you need to do is check Gradient
Checkpoint and this is really important otherwise you will get out of VRAM error and other than
that, you don't need to do anything else. Use xFormers is selected. You can also select memory efficient attention. However, it is not reducing VRAM usage. You can also try full FP16 training. I tested and it didn't make any difference
in terms of VRAM usage. So these are all of the settings. There is also one more thing. No half VAE should be selected for SDXL. Some people also asked me how can they continue
training at a later time. For doing that, you need to check save training
state and if you want to continue from previous training state, you need to resume from previous
training state. So in this tutorial, I will show you how to
do that. I checked save training state. I am not generating any samples during the
training. You will likely to get out of VRAM memory
error if you do that, especially on low VRAM GPUs. So these are all the settings. Let's save them and then you can click print
training command. It won't start training and it will display
you all of the settings. It will display you how many images it has
found. This is really important. You need to see all of these values correctly. So it says found 13 training images and it
is going to take 325 steps for each epoch. Why? Because the repeating number is 25 and it
is reading these repeating numbers from the folder name. Then it says it will use regularization images
and regularization images will be also 325 because the regularization images repeating
count is 1 and in each epoch, it will do 325 steps and in each step it will use 1 regularization
/ classification image. Therefore it will be 325. Training batch size 1. Gradient accumulation steps 1. Do not increase these two unless you have
to. In which case you may have to? If you need faster training then you need
to increase. However it will reduce your training quality
and it will make you to find a more optimal learning rate. So your learning rate will change if you increase
your training batch size or gradient accumulation steps. Gradient accumulation steps is actually fake
training batch size. I explained this in previous videos. It is not very important right now. Hopefully I will make a video about general
fine tuning and I will explain all of these things in more details. So how many epochs we will do training? We will do training 8 epoch. Then we will compare each one of the generated
checkpoints and find the best checkpoint. I will show you how to do that. Max training steps is calculated like this. So in each 1 epoch, it will do 325 steps. 1 step means that 1 time GPU execution. Why 325? 13 images, 25 repeating. Then what is this? It says that it is divided by 1. Why? Because our training batch size is 1. Then it is divided by 1.0. Why? Because gradient accumulation steps is 1.0. Then it is multiplied with 8. Why? Because we are going to do training 8 epochs. Then it is multiplied with 2. Why? Because we are using classification images. So in each step, it will also do training
of classification / regularization images which will further fine-tune our model. Why? Because we are going to use ground truth images. What if if you don't have subscription to
my Patreon? Then you can generate your class images from
Automatic1111 web UI. So let me start it. It is starting my Automatic1111 web UI. Okay it is started. You see it has taken the next available port
because our Kohya GUI is already using this port. Let's open it. Let's say you are going to do training of
a woman. I don't have regularization images for women
yet, but it is on my to-do list and hopefully I will release it on my Patreon. All you need to do is type photo of a woman. Use this prompt usually because you want to
get as much as possibly normal images. Set your width and height to 1024 and then
right click and generate forever. Generate as many as you need. Here an example image. I also made the sampling steps 30, but you
see this image is nothing like a real image. It is nothing like a real image. This is also useful when you want to keep
the original style of the model or keep the certain style of a model. You can also use styled images. Whatever you use will further fine-tune model
to that style or that quality so it is really important. And finally, it will list you all of the command
like this. This is basically generating the command line
interface command for you and executing it. So the difference of GUI from the original
Kohya GUI SS is just this. You can also copy this with ctrl c, paste
it into a text editor and change every parameter you wish. For example: some people asked me how they
can do train only UNET. So this is the --train_unet_only. Then append it to the end of the training. Prompt copy it. Enter inside your Kohya GUI SS folder. Enter inside venv virtual environment folder,
enter inside scripts, start a new cmd type activate. After you have activated, paste the command
and hit enter and it will start training. We will see in a moment. It gave me an error. Why? Because it is looking for this sdxl_train_network.py
file. However, since it is using relative path,
it is not available here. So I will move into the main folder like this
and I will copy paste the command again. Pasted command again. Hit enter and now it should work. Whenever you get an error, you should look
where the error happened. Sometimes people are asking me okay, we got
another error. Okay, it says that sdxl_train is not supporting
train UNET only. However, it says here that it is supporting
so this is definitely a bug. But if they fix it later, you can try this. Okay, let's close this. Let's return back to our GUI instance and
start training and we will see that it will start training. This time it will execute the training, not
just display. However, this training will happen on my main
gpu right now. So how you can use it on your second gpu and
for demonstration purposes for low VRAM. I will now begin to training with second gpu. So before closing the existing running cmd
window, I will save it. Then I will close both of these windows. We are going to use set CUDA_VISIBILE_DEVICES
argument. You can use this pretty much with every application. So where we need to add this command? We need to add this command into the gui.bat
file which is here. Right click, click edit and add that line
after calling Activate.bat file which is activating the virtual environment. Save it, then start your GUI. Now we will see that it is displaying RTX
3060 on the screen with its available vram and other features. So let's reload our interface. LoRA configuration open. We have saved our configuration here. Select it, then it will automatically load
everything and hit start training. And let's look at the vram usage of my RTX
3060. since my RTX 3060 is fully empty, not any
other application is using I will get the best performance and we will see the how much
vram it is going to use exactly. Okay, it's starting first time. It will cache all of the images into the directory
as a permanent file since we have selected it. It will take some time. Normally using multiple threads is supposed
to increase this speed, but it is not. Maybe it is a bug. Caching latents is using cpu, however, it
is single threaded. I think actually it is supposed to follow
number of cpu threads per core option we defined here, however, it is still using single thread,
therefore it is slow. I reported this to the Kohya. I hope he fixes this because if we had so
many images, it would take too much time unnecessarily. This caching of images into the disk will
be only one time. Then it will start training. You see it is starting to use more dedicated
gpu memory. Your shared gpu memory will not be used so
don't depend on it. With network rank 32, it is using 11.5 gigabyte
vram of my gpu and the speed is 2.4 seconds per it. This is probably the maximum speed that you
are going to get currently. However, if they improve the script libraries
dependencies it may get better. But for RTX 3060 currently, this is probably
the maximum speed that you are going to get and it's a decent speed. You see, my training will be completed in
3 hours 30 minutes. When you run this script first time, you will
see that this duration is decreasing faster than it should be because there is some statistics
issue. So wait like 10 minutes to get the final it
per second or the duration of your training. Until like 10 minutes it will keep decreasing
the time. When you check your training folders, you
will see these files. These files are the cache of the images since
we cached the latents in disk as well and when I do another training, it won't generate
this cached latents again. It will use the existing ones. Okay, it has reduced to 2.38 seconds per it. So this is not the final yet. If you get an error like this, image file
is truncated that means that one of your training or classification image is corrupted. This usually happens when you use RunPod or
cloud services and upload your images and upload is not completed. So if you get such error during training,
make sure that all of your images are perfectly working. I have an image validator script on my Patreon. You can also use this script to verify all
of your images validity. The link of image validator script is also
shared on the GitHub readme file. The training is continuing on. Now it is displaying 3 hours, 21 minutes. So far 5 minutes passed. If you get any error, join our Discord channel. The link is here. You can also follow me on Twitter from here. I have Udemy course as well and if you support
me on Patreon I would appreciate that very much. You can also follow me from LinkedIn and I
also have a CivitAI profile as well. All of the links are here. You will find these links in all of my tutorials. I appreciate them if you follow me. If you support me. It has been almost 10 minutes and it is almost
stabilized. So you see it is going to take 3 hours 26
minutes for 5200 steps. The Kohya added stop training feature as well. Let's stop it and let's try with network rank
dimension 8. If you can't make 32 work, you can reduce
your network rank dimension to reduce vram usage. And let's start training again to see how
much vram it will use. So you see after I stopped the training it
became 0 vram usage. Then it will start training again. You see it has been frozen the screen. I hit enter and it continue. Now it's starting. You will also get this error caught was no
module named triton. Because triton optimizer is not available
on Windows. So you will also get this error. Don't worry it is just an optimizer. Therefore, we will get a little bit lesser
speed, but it will work perfectly fine. I hope that the triton developers also release
packages for Windows. I even opened an issue on their development
GitHub. You will also see that it is using DreamBooth
method even though we are doing LoRA training because LoRA is an optimized way of DreamBooth
training. LoRA is an optimization technique used in
machine learning trainings. Okay, this time it won't recache images because
it already has the cached images. It is starting. Let's see how much vram it will use this time. You see the vram usage reduced 300 megabytes. It was 11.5 gigabytes. Now it is 11.2 gigabytes and you see the speed
is increasing because we need to wait like 10 minutes to see the final speed of training. Okay, the vram usage increased to 11.2 gigabytes. I think it is stabilized. The speed is also getting increased. What if if you still can't make it work? And what if if your vram usage is too much? You will get terrible speeds such as 30 seconds
per it if it uses too much vram because it will bottleneck the gpu. Sometimes you won't get out of vram error. It will continue training but it will be too
slow. This is happening when it uses too much vram. Let me demonstrate you. So I will stop the training. Then I will make the network rank dimension
96. Actually let's try 108. Okay now it is using 11.8 gigabytes and you
see the speed is decreased significantly. Now I will further increase the network rank
dimension to have more bottleneck of gpu vram. Stop training. Let's try network rank as 116. Okay with 120 network rank dimension now it
is taking almost 7 hours. So let's say no matter what you do, you are
getting out of vram error: make the network rank dimension 4. This is a really low. This will not be very good but if this is
what you can do you, you need to do it. Then start training. Let's see how much vram it will use. It is using 11.2 so it is same as 32 rank. Therefore find your optimal rank. Okay now I will continue network rank 32 dimension
full training. So it has been over 1 hour and we have saved
2 checkpoints. You see we are at the epoch 3. You will notice that the epoch 1 the first
checkpoint was saved at 650 steps. Why? Because number of training images multiplied
with repeating. Then since we have classification images it
is 650 steps per epoch. And since we are saving every one epoch, the
model checkpoints are generated. Where they are generated. They are generated in the folder where we
define it. We are also saving state for continuing and
each state file is 8 gigabyte. I think the saved state file size is independent
of what network rank dimension you use. The checkpoint files are only 200 megabytes,
but each state file is 8 gigabytes. So I will not wait until this training is
completed because I have done that before. I will stop training. I will show how you can continue from saved
state because it was asked of me. So in the Kohya GUI, go to advanced configuration
tab which is at the below (now it is at the top). Then find save training state section and
here you will see resume from saved training state. Click this folder icon, go to the folder where
you have saved your training. Inside LoRA, inside model. Let's continue from state 2 which is generated
at the second checkpoint. Okay, select folder. Now it should continue. Let's hit start training and let's see what
happens. Okay. You see cmd windows is frozen so I need to
hit enter. Okay, now it's continuing. Whenever you start a training, it will also
generate a unique json file. This json file will be saved here as you are
seeing. So even if you don't save your configuration,
you will have saved json file that you can load. Okay, we should see it is loading the saved
state. Okay, we can see that resume training from
local state and this is the state where it is loading and resuming. This is how you resume your training with
LoRA when you are using Kohya GUI for training. So now I will close my Kohya GUI. I have previously completed checkpoints here. You see: 12 gigabytes settings safetensors
file. At the end you will also get this 12 gigabytes
settings safetensors without any additional numbering. This is the last file that it will generate. Let me move them into new folders. So this is our fresh installation. Models, LoRA. I will just paste them here, even for demonstration. I will move them into the model folder where
they would be have saved if we had continued our training. So let's start our Automatic1111 web UI and
I will show how to do checkpoint comparison and how to use those LoRAs. Okay, double click webui-user.bat file. The Automatic1111 is starting. This is the commit hash that I am using right
now. These are the web UI arguments that I am using. I don't have any extension installed. You don't need additional networks extension
anymore to use LoRAs with Automatic1111 web UI, so don't use it. Okay, it is started on this url here. Our SD web UI loaded. So how you are going to use your trained LoRA? When you click this icon, it will open this
tab and in here you see Textual Inversion. This is another training methodology. Hyper Networks. This is also another training methodology. I have tutorial for Textual Inversion. Not Hyper Networks. Checkpoints. These are our models and LoRA. This is where we will pick our LoRA. You see it is also displaying the folders
inside LoRA folder and it is displaying all of the LoRA checkpoints that I have. So this 12 gigabyte settings are the LoRAs
that I have previously trained and you see 12vram. This is our first checkpoint from this video
training and this is the second checkpoint. How we are going to know which checkpoint
is best performing? So we have trained our face with ohwx man. Ohwx is our rare token. Man is our class token. Let's type photo of ohwx man and nothing else. Then we need to add the LoRA checkpoint. For adding LoRA checkpoint. Either you can type its full name or you can
click this icon and then it will append it to here. Okay after it is appended here, click generate. Currently we are using 512 and 512. Therefore we will get very bad quality image
as you are seeing right now. So with SDXL you need to use the native trained
resolution or bigger resolution. Okay let's hit generate and we got the image. It is not very good because of the prompt. It is also pretty much memorized the background,
the clothing. So this is our first attempt to see what we
are getting. We will make this much better. Okay let's try with another checkpoint like
this and let's also change the prompt. I have shared some prompts here, so let's
copy the prompt from our GitHub readme file. Let's paste it. Okay, this is going to use the seventh checkpoint,
then let's copy the negative prompt and let's paste it here. Let's make the sampling steps 30 and hit generate. Okay, it is coming up. This is not a cherry pick. I will show you how to get very good quality
images as well. Okay, this is the image. Let's generate 10 and then pick the best one
to assign a preview for our LoRA. How we are going to do that, let's hit generate. This will generate 10 images one by one. You can see the entire progress here. It is 2.88 it per second. Why? Because I am recording a video right now with
Nvidia Broadcast so they are using a lot of gpu power. Okay, I have got 10 samples and let's see
which one is looking decent. Okay, let's say this image is looking decent
and I want to set this as preview of the seventh LoRA file. How am I going to do that? Let's open the LoRA again and let's go to
the seventh checkpoint here. Click this icon, replace preview and you see
it is replacing preview with the selected and save and now this is the preview of my
LoRA. So how am I going to find the best LoRA checkpoint? First of all, you need to decide your prompt. This is really crucial. For finding the best LoRA I will use this
prompt, copy paste it. Then I will copy paste negative as well. Make sure that you are fixing your LoRA checkpoint
naming with yours. This is mine so I will delete this. Let's go to the LoRAs. For example let's start from the first one. This is the first LoRA file. Moreover, for making it easier, I will do
a manual renaming so this is where my LoRAs are located. The final LoRA is named like this. I will make its naming same as the other,
so I am renaming it as 8. So this is actually the eight checkpoint. Then all we need to do is we need to change
this number. So copy this, go to the x/y/z plot and in
here use prompt search and replace. So it will look the first string in this string
you have written here. Then I will replace the values like this. So I am going to check each one of the checkpoints
as you are seeing right now. So it will test all of the 8 checkpoints. Make the batch count like 12 so it will generate
12 images for each checkpoint. Moreover, CFG 9 is sometimes working better
for LoRA training at least for me, but it may not be same as you. This is the weight of the LoRA. If you reduce this, the effect of the LoRA
will reduced. This is also improving the importance of these
two tokens. These are our trained rare token and class
token. Okay, this is all optional. This is all related to you. You need to find a good prompt as a base and
do this checkpoint comparison. For LoRAs this is the way. Then let's make the grid margins 50 and hit
generate. We will see the progress here. It says that it will generate 96 images on
8 multiplied with 1 grid. 12 images per cell. Okay, let's just wait now. So the x/y/z plot generation has been completed. Let's open it from folder, click here. It will open text to images folder, go to
outputs, go to grids and in here you will see the final grid file which is 110 megabytes. I will open it with Paint.net and here. Now this part is totally subjective and you
need to figure out which checkpoint is looking best. When we consider the first checkpoint, it
is certainly under-trained. Let's move to the second one. This is also looking like under-trained. So let's move to the third checkpoint. You see this is the third checkpoint. It is looking decent. Let's move to the next one. So this is checkpoint 4. It is again looking decent. With LoRA training every checkpoint produces
very different images. This is not the case when you do DreamBooth
training, but with LoRA they are very different results even though you use same seed values. So let's look at the checkpoint 5. Okay, this is a really good picture. Actually, let's compare with our training
data set. So the training data set is here. And yes, let's open this image. Okay, you see this is the LoRA generated image
and this is the real image. Okay, this is also good. Let's look at the checkpoint 6. So here we are seeing the checkpoint 6. Okay, let's look at the each example. You need to find your best checkpoint. This is totally subjective as I said, so I
can't say that it will be also best for you or not. And these are the other images. Even at the checkpoint. Okay, there is something very important. At the checkpoint 8. We see the overtraining. How? It even has memorized the background this
on the wall. Or here it is fully memorized. It lost its flexibility even though I define
it white suit it is not even completely white. This background is also pretty much memorized. So checkpoint 8 is definitely over-trained. This is clear indication. Let me show it from you the training data
set. You see this is on the wall and it is also
here on the wall too. Even this detail, yeah, you see, it is completely
memorized. This is why you should have different backgrounds. So we eliminate checkpoint 8. In the checkpoint 7. Still, it is a little bit memorized. It's a decent quality, but still a bit memorized. Okay, let's look at the checkpoint 6. Yeah, on the checkpoint 6, we don't see such
memorization and the results are also very decent. Still, there is some memorization, but not
at that level. And on the checkpoint 5, the results are really
good. So either checkpoint 5 or checkpoint 6 are
good. We can do another test to see which one is
more generalization. This is for realism. For realism, we can use checkpoint 5 or checkpoint
6, even checkpoint 7 we can use. Actually, I used this to generate thousands
of images. I will show you. So another test: what could it be? On the GitHub file when you click Stable Diffusion
link here, it will open my main repository. Please also Star it. We have reached over 700 stars. It is really important. Please also fork it and also watch it. If you also become my sponsor I would appreciate
that very much. In here you will see amazing prompt list for
Stable Diffusion. I am updating and improving this file. Also recently I started to add pictures as
well with the prompt. So I will test this prompt because it is looking
pretty decent. When you click this image it will show you
the output. Let's do another test with this prompt. Let's select and copy it. This prompt don't have any negatives so let's
paste it here. Let's delete the existing LoRA and let's delete
the negative prompts and hit generate again to see what we are coming up with. Since x/y/z plot is selected, it will compare
each one of the checkpoints. Actually, you can remove the first 4 checkpoints
to speed up, but for this experimentation I will keep it so we will see the difference
between each one of the checkpoints. By the way, the last generation took about
18 minutes for all of the checkpoints. The new test is also completed. Let's check out the results so it is also
inside the grids. Let's open it. Okay here, let's re-evaluate. So this is the first one. Okay second checkpoint, third checkpoint,
fourth checkpoint, fifth checkpoint, the sixth. It is still pretty amazing. Especially this one. Okay the seventh. Wow. The seventh is looking even better I think. Yeah, really, really good, really good. And the eighth checkpoint. The eighth checkpoint was over-trained so
the seventh is also really good. So this is how you can evaluate your results
and see at which point it is starting to become over-trained or not, it is totally up to your
taste. I find that up to 200 epochs is delivering
good results. From my past experience as well. So yesterday I used this prompt with changing
the suit color and generated 2000 plus images. You are seeing them right now. You see starting from the 0. Let's go to the very bottom up to 2471. Finding the best images among these too many
images is really really hard. After looking for a while you will become
face insensitive, right? Even you won't be able to recognize yourself. So what you can do? I have an amazing tutorial that will sort
your generated images based on a reference image. The tutorial link is here. You can watch it. Learn it. I have shown the script that I use. Alternatively, you can download the script
from here from this Patreon post. There are also instructions shared in this
post as you are seeing. So either you can become my Patreon supporter,
download it or watch this tutorial to learn. This script is very easy to use. After you install the requirements, you just
give two folders like you are seeing. The original folders and the generated images
folder. Based on your given images it will sort them. Now I will show you the results. So at first I used this as a base reference
image. Based on this real image. This is from the training images. It will sort all of the generated images. It doesn't have to be from your training images. You can provide any reference image and based
on the face similarity the algorithm the AI will sort all of the generated images. Now let's look at the sorted images. So here we are seeing the sorted images results. You see they are amazing quality and based
on this sorting you can find your best images and use them on your Linkedin profile, on
your Twitter profile, wherever you need them. Can these images be improved? Yes! If I had used a better training data set,
probably I would be able to get better results. Moreover, I find that this tutorial is still
better than SDXL for realism. However, SDXL is much better for styling than
the SD 1.5 based models training. I will also explore DreamBooth training of
SDXL and I am expecting even much better results than what I have obtained with SD 1.5 based
workflow. So this was the results of first reference
image. Let me show you the second reference image. So this is the second reference image. Based on this second reference image, you
can see the new sorting. You can give any reference image and based
on that reference image, they will get sorted. For example, this is the reference image. This is the SDXL LoRA generated image. I think it can be still further improved and
when we have SDXL refiner training I think we will get much better results. But it is still super similar. Super high quality. The clothing is very high quality and with
even a better data set, I think we can get better results. I will also explore higher resolution training
not 1024 and 1024. I will use even higher resolution and test
the results. So here the third reference image. You see I am looking another direction in
this image and now when I see the sorted images results, you will notice that most of the
most similar images are looking at the same direction. So with this approach, you can find the similar
direction looking images. It is very convenient to use. Moreover, you can put multiple reference images
into a single folder and then the script will sort them based on the average similarity. That is also possible. This approach is also possible, so watch this
tutorial to learn more and use these scripts if you wish. If you become my Patreon supporter, I would
appreciate that very much because my Youtube revenue is very bad and your Patreon support
is significantly very important for me. You may be wondering the styling capability
of the model. So in this Patreon post, I shared how to get
amazing prompts with ChatGPT for Stable Diffusion. Let's download the prompts. Okay, let's open it. There are some prompts here. You can test all of these prompts, generate
images, and then compare them. Let me show you the results. Okay, here the results of those random prompts
totally generated by the ChatGPT. There are some very, very interesting and
different prompts. There was a single prompt. I liked very much. Let me find it. For example, you see they are very, very interesting. Okay, let's work on this prompt because I
liked it and I will improve the face and show you how you can get much better quality. Since this is a distance shot you see the
face quality is not very good, but we can make it better. So to be able to use it I will use png info
in the Automatic1111 Web UI. Let's drag and drop the image into here. Then let's send to text to image tab. Don't forget to change your LoRA version accordingly. So let's pick our best LoRA which is like
the setting seven, the checkpoint seven. Okay. Then there is certainm seed. However, this was generated with another LoRA
so probably we won't get the same result. So let's first try whether we're going to
get. I want to show you the logic of how to fix
and improve the image. Okay, it is getting somewhat similar, but
not similar. Okay, I will use the same LoRA to get the
same image. Or alternatively, let me generate some images
and find a decent one so you will see what are these settings are capable of. I will right click and generate forever until
I get a decent one. Oh by the way, don't forget to make the seed
random. So I will cancel, make the seed random and
then I will generate again. So let's skip this one. Right click and generate forever. Okay. Okay the images are being generated. For example this one. It is looking decent or this one. We can improve this one significantly. It is also looking very cool. Okay, this one is also really looking well. Yeah except it has some missing parts. Okay here another image which can be improved. This one is also pretty cool. Okay here another one. You see, SDXL is extremely flexible. It generates this realism then this styling
with the same prompt or this realism. This is also realistic image. It is so capable of outputting very different
concepts. Very different art styles with the same prompt. Okay, you see this is also like dragon merged
with a horse. Okay let's try to improve this one. I won't wait too much. This is just to show you the logic. So let's just cancel generate forever, then
go to the png info. Delete the older one. Let's drag and drop the image, send text to
image tab. Now we need to test high resolution fix until
we get errors. What I mean by that? So for example let's upscale with 50 percent,
make the denoising strength 50 percent and then hit generate. When you start getting repeating issue, then
that is the point where you need to stop. Until getting that repeating issue you need
to try bigger resolution with upscale by percentage. Okay with 50 upscale there is no noticeable
repeating problem so let's try 60 percent. I will increase it to 60. Generate. Okay, the image has been generated. Let's look at it. Okay still can we say? Yeah we are starting to see some deforming
here and here it doesn't exist or not very noticeable. So let's use this 1.5 which is 50% increment. What are we going to do is as a next? Let's go to the png info. Delete, load the image. Send it to the inpainting. Okay, the same settings are loaded. What we need to fix is let's first fix the
hand because it is not looking very good. So I will just mask it like this. Select only masked from here. Let's do not change anything else and try
like this. A hand of a male. Okay and let's generate. By the way it will only in paint the masked,
inpaint masked, original, only masked padding pixels 32. You can also increase this and try different
values. You can also increase denoising strength to
see the effect. All right, let's look at the results. So click this icon. It will open the image to images folder and
when we look from zooming in, this is the image. It is not looking very good. Also, the tone is not very much matching. So what can we do to fix this? First of all make the seed minus one. So every time we will get a different result. We can zoom in and try to mask it more carefully. When you hover your mouse here it will show
you the options. You see. So while I am holding my alt key, I am zooming
in. While I am holding my control, I am able to
make the masking area smaller so let's mask it a little bit more detailed. Okay, like this. Maybe like this. Allright. Let's also fix the error in our prompt. While holding f you can move the canvas. Okay, I think we need to be on the canvas
section. Yeah, now working. All right. We have 50% denoise 32 pixels paddings. Let's try several more times until we get
a decent result. During the generation, it will show you the
zoomed in generated area like this. Okay, after several tries, I got a very decent
image. As you are seeing right now, there is still
some toning mismatch, but when you look from the distance it won't be very much visible. This is the logic. This can be perfected. So I changed the prompt: ohwx man holding
bike arm. You see, I defined it more carefully. Then we need to send this image to the inpaint
again so the image here will get updated. Also, click this to update it. Then we need to use the original prompt we
used generate this image to fix the face. So the original prompt was this. Now let's zoom in and mask the face so I will
make the masking bigger. Let's mask it like this. There are also automatic masking and automatically
generating in-painted face extension called as adetailer. Hopefully I will make a tutorial about that
as well, but not in this one. All right, let's try again. Okay, even at the first try we got a decent
image. Now all you need to do is try several times
until you get a decent image, until you are satisfied with the result. You see even at the first result, I got a
decent image. This is how you can improve distant shots
generated images of yourself. Previously we had to use this Kohya developed
sd web UI additional networks extension, but you don't need to use it anymore. Now Automatic1111 web UI is supporting all
of the LoRA types automatically, which is great. I added that Patreon post link to here which
I explained how to get amazing prompts for testing. It is here. So let's say you want to use same training
command that I used. It is shared here. Copy it, then paste it into any text editor. Then you need to change your folder paths. Make sure that these folder paths are properly
prepared in your case, otherwise it won't work. After you have done this all you need to do
is just copy it, activate the Kohya virtual environment, and execute it as I have shown
in the beginning of the video. I also have some other prompts here you can
play with them. You see in some cases you need to reduce the
weight of the rare token. Let's try this for example. Let's copy it. There are no negatives for GTA 5. Okay, let's paste it. Let's disable high resolution fix, make the
seed random, delete the negatives, and let's generate 12 images. Okay, here we got the results. Not all of them are very good. Like this. This is decent. This is decent as you are seeing right now. This is not related. I like this one. Okay, this one is also decent. This is also very decent one. Let's increase CFG to 9 and try again. Okay, here we got the CFG 9 results. Okay, as we are seeing right now, they are
decent, but not very good. Okay. Okay, this one is decent. It is following face strictly but not very
stylized. Maybe we can do reduce the weight of the LoRA
so it will follow more. Like this. Let's also remove the weighting here and try
again. This styling capability totally depends on
your training data set, your used checkpoint, and the weight of the LoRA. If it doesn't get styled enough, try to reduce
the LoRA weight and especially try to improve your training data set differentiation. As I said, use different backgrounds, different
clothing, and use very high quality images in your training data set. Okay here the results. When we give 90 percent weight to the LoRA. This one, this one, this one, this one decent,
this one, this one, this one, this one, this one not very related, this one and this one
and this one. So you see you can even further reduce and
try again. These are not cherry picking. You know with Stable Diffusion it is numbers
game. You need to generate a lot of images to get
the perfect image that you are looking for. That is why my similarity script is very important
and very useful. And we got results for 80 percent. Okay, let's look at them. Yeah, it is becoming more and more gta5 style
so you need to find a sweet spot between your likeliness and your styling. And here the results. It generates one black and white. I don't know why, but the results are decent. This is not very like me, but you see it becomes
more like gta5, especially maybe this one. Thank you for watching. I hope you have enjoyed. Please subscribe, join, and support me. It is really important. Leave a comment. Ask me anything you wish. Like the video. Share the video. All of these things will help me significantly. We have amazing Discord channel. Click this link. It will open you this page. Join our server. We have over 4000 members and we are growing
each day. We have some experts. We are sharing knowledge with each other. Follow me from Twitter. Click this link to open my profile. I am a PhD doctor. This is my Twitter. You can also follow me on LinkedIn. Click this link. This is my LinkedIn profile. You can follow me here. Connect with me here. I also started a CivitAI profile. As I said, everything you need will be in
this GitHub file. The link of this GitHub file will be in the
description and the comment of the video. Read this GitHub file very carefully when
you are watching this video or after watching this video. This file will be up to date. I will update this file if it be necessary. Currently when installing Kohya on Windows,
there is nothing else you need to do just install as I shown in the video. If something gets changed, I will update possibly
these two sections. These extra arguments are important. This is for Adafactor. If you use other optimizers, it will probably
not work. This is the json config file that I used. This is user training command file. There are some prompts here you can use them. Also this is my pip freeze information. Hopefully see you in another video. The raw recording of this video took more
than 2 hours. You can imagine how much time I have spent. I hope you consider supporting me on Patreon. Thank you so much.