How to correctly train your Stable Diffusion model

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so here's the thing you woke up one day and saw all these videos of people transforming themselves into beautiful AI of those I just turned myself into artwork using Ai and the results are insane [Music] I won't date but I'm broke I don't have money to buy a subscription of Lanza so I went with the next best option stable diffusion and after watching a few tutorials I tried creating my own AI Avatar and after hours of struggle I successfully trained my very first model and the result was what just happened this is not what I expected I followed each and every step yet my Generations are nowhere close to theirs forget about being good what the hell even is this what had I done wrong how to train your own stable diffusion model correctly [Music] but before we even start training our stable diffusion model you must know first of all how stable diffusion actually works so what stable diffusion actually is is nothing but a program generating an image from noise it's basically like you're trying to look at a cloud and figuring out what the shape actually is and this is only possible because of diffusion models which are generative models designed to generate new data similar to what they have seen in training in the case of stable diffusion this data consisted of images so the way this deep learning model actually works is that it takes an image and step by step put noise onto it until we end up with Pure Noise and when the picture is completely covered with noise then the diffusion model steps in and uses its Ai and tries to undo the noise to recreate the real image so in other words the system recreates a totally new image that looks exactly like the image that we fed into train your own stable diffusion model all you need is this collab nodo whose link is going to be in the description below open it now the reason that we are using this collab notebook it's because through the collab notebooks we can use Google's remote gpus that are far superior to the average gpus of a computer you can run stable diffusion on your own PC but I would highly discourage you from doing that why I already made a video about it but to summarize it all I could say is if you are rich as and can afford a PC with more than 10 GB of RAM which if I might remind you the GPU itself would cost you thousands of dollars you could definitely try but for the poor peasants like me who haven't broken through the Matrix just use this collab notebook just look at the GPU that you get with it so after opening this collab notebook you just follow the steps shown on the screen first click on this play button and this will check your GPU and vram now the next step is to paste your hugging paste token to get a hugging face token and you need to First have a hugging face account and after that go to your profile over there click on settings then access token here create a new token let's name it stable diffusion world then change the role to weights and hit generate now copy this code and paste it over here hit run after that we move to our next step settings and run here click on save to drive then go to the output dri and here instead of zwx write the name of your model like siddhant 3 then hit generate so it basically made a folder by the name of siddhant 3 where all your training picks will be saved after doing all of that now we can actually start training our model here we need to do some changes into our code for example next to instant prompt you will see the photo of zwx dog this is what your prompt is going to be so therefore instead of photo of zwx dog I wrote photo of siddhant 3 personal make sure your problem does not have some common name like Shahrukh Salman because there are millions of photos related to them so even if you train your own model by the name of Shahrukh it might give you a photo that resembles move to the actual Shahrukh Khan therefore try to give your prompt some sort of a unique name for example as I did over here siddhant III person after that down to the class prompt we write photo of a person to tell the stable diffusion that we are creating a model of a person not of an object now we go to instant data directory this is where we are going to upload our training pics for our model so I change it to siddhant 3 and in class data directory named it s person down here as well again person or two of a person and person after doing all of these changes click run and now we come to the actual training part of stable diffusion and to be more precise training a model that doesn't look like this but actually generate great photos like this and it all depends on how you train your model so before we start there are some crucial things that you need to keep in mind first is that stable diffusion is trained on images with a 5 trailer 510 resolution database meaning that the stable diffusion understands compositions that are 5 12 by 512 pixels it's like it can only see through a window size of just 5 12 or 512 pixels and that's the main reason whenever you put a wider image through stable diffusion image to image composition it gives you double character when the original image had just one why is that simple because stable diffusion is breaking your whole image into little chunks of 5 12 or 512 pixels and then treating them as separate entities therefore if the prompt says photo of a person then it tries to see a person over in every Cube thus leading to these faulty generations with multiple characters and it's also true when the image is cluttered with too many things again again the system will break the whole image into small squares of 5 to the 512 pixel and then we'll try to see a person in every Square so while uploading your training images there are few things you must keep in mind first one all your images should be around 5 to the 512 pixel if not make sure all your pictures are square in size second make sure there's only you the main subject in all of these images no extra person or else the stable diffusion would relate to turn 3 person with multiple people 0.3 make sure there is no extra stuff in your picture no walls no pattern object or else it is going to relate all those things with your main subject that is siddhant three person for example while training my model I accidentally put some pictures of me wearing a turban and because of which after I was done training that model when I see the final results one out of four Images were always of me wearing some sort of Turbo so therefore make sure your photos only have you yourself in those photos without any objects and with only those characteristics that you want in your final image and now the last Point make sure you cover all the necessary poses and expressions that you could possibly make a basic rule of thumb that I used to follow was to have around 50 to 60 images with around 20 to 30 pictures of only my face in all possible poses and expressions then 10 to 15 pictures that are medium wide so that the program can see what my upper body basically my head and shoulder actually looks like and after that around 10 more extra pictures of my full body with all the possible poses and after sorting out all these images I then not only remove the background from all these images but also changed their background from white to green blue and red so that the model doesn't associate me with my background and by doing all of this the stable diffusion started to see that the only thing that was consistently changing and all these pictures was just my body and thus it trained itself only on my data so after choosing your 60 or so images all you need to do is go back to your collab notebook and go to this section here you have two ways of uploading your images you can either press play here and this will provide you with an upload box you can upload your pics over here but what I found out is that the most easiest way of uploading pictures is just to go to your folder click on data folder here you will find a folder by the name of siddhantri which was created in a previous step and then drag and drop all your images over here and after that go to the next step make sure you don't press display button because if you do the code will show you an error because you already have uploaded all your pictures directly to the folder so the next step is to move to this box here is where the actual training will happen but before playing it you need to change some settings come to the max training step apps and change it from 800 to 1000 this function basically determines how many times you want your AI to go through all your images what I found out is that thousand is a good sweet spot over which the system starts to go crazy after that go to the save sample prompt and here write the prompt that you are going to use like here is going to be photo of Satan 3 person instead of zws talk and after that now you press run and now the actual training starts please be patient with it as it's going to take some time mine took around one and a half hours just for the reference after you are done with that now it's simple just press play over here and here this will give you a preview of your Generations how they are looking after that go to the convert weights to ckpt this will create a ckpt file of your model and that's what you are going to use whenever you are trying to use your model so hit run and press play over here as well and finally you got your model ready which you can just use right here photo of siddhant three person hit run and now you got it but even then also these photos do not look as good as the images that you mostly saw on social media so have we done something wrong well not actually it's all in your prompting how well you could explain to the AI what you are trying to accomplish and for that we got some cheats generously all you have to do is go to this website called prompt hero select any picture that you want now all you need to do is copy that prompt and paste it over here and instead of a punk girl I wrote Gupta said to person where Gupta said to was the prompt name of the new model that I trained and the word person indicated that this thing is human rather than being an object and after doing this small change hit run and voila you got yourself your own AI model that creates your own area of the just like the tab lenser but for free so now go crazy with it want to turn yourself into Kratos from God of War here you go love Metal Gear Solid well now you can turn yourself into Big Boss himself feeling weak and underconfident well worry not because now with the help of AI you can turn yourself into Giga chat himself the literal embodiment of perfection just with a click of a button foreign if you like this video then you might like this as well where I turn black widow into a Disney princess using stable diffusion
Info
Channel: That Arts Guy {Siddhant}
Views: 27,257
Rating: undefined out of 5
Keywords:
Id: eh1LOS7TFZ8
Channel Id: undefined
Length: 11min 48sec (708 seconds)
Published: Fri Jun 02 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.