TEXTUAL INVERSION - How To Do It In Stable Diffusion (It's Easier Than You Think) + Files!!!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

how to train stable diffusion with your own images hello my friends how are you doing in this video we are going to talk about textual inversion it sounds super complicated but it's actually very easy to do now for this I'm using the stable diffusion local install automatic 1111 here is the installation guide before we get started let's first talk about what is textual inversion at all you can see here on this page that I'm gonna link under the video the stable diffusion conceptualizer and here you'll find a lot of different styles and also subjects that have already been pre-trained for stable diffusion so you can actually also download them and use them right away and over here on the right side the cool thing is you can actually try them out so you can see here I'm using this style here with that simple word just princess and you can see that output is kind of similar to what we see over here on the left side now now I remark the things you see here on the left side they are input images they are not output images so these are actually the originals and not what has been created with stable diffusion to download those right click here on the name select open link in new tab and then go to the new tab go to files and versions and the only thing you really need in here is this file so you click here on that arrow that is pointing downwards and it asks you to save give it a name that you can remember maybe the name of the style here or something else and you can also use that word that you save it under as the word in your prompt to initialize that style in stable diffusion now of course you have to save this inside of the stable diffusion folder so go to your stable diffusion folder there you have an embeddings folder and in there you want to save that file you see I have a lot of PT files already in here and Bin files also so these are the different styles if you have automatic one one one one installed but you don't have the embeddings folder and you don't have a textual inversion tab up here your version is outdated so what you need to do is to Simply go to the stable diffusion website click your own code download the zip file and then unpack it and throw everything you have in there into this folder so that it replaces all of these files after you've done this some of your settings might have been resetted to the original so go here into settings and check if everything is still as before and of course if you make changes click on apply now that we have a basic understanding of what textual inversion is let's see how it is done in stable diffusion so first of all you need a name here and you can choose anything but don't use words that you use all often like for example don't write doc just because you train on images of your dog have maybe a number with it like 2022 so this is more unusual and you won't use that Style by accident for the initialization text you can basically write the same thing but afterwards you can just use the file name to trigger that kind of style the next one here is the number of vectors per token and you can set this up to anything you want but I found that a value between 8 and 10 is pretty good now what are these tokens let's have a look at the explanation so here it says the number of vectors per token is the size of embedding the larger this value the more information about subject can fit into the embedding but also the more words it will take away from your prompt allowance you probably didn't even know that stable diffusion has a problem allowance so the prompt can't just be endlessly long it actually has to be rather short this page which is called tokenizer and when you put your prompt in here it will tell you how many tokens are used so that means if I have this as my prompt and it tells me it has 64 tokens already used and I set my stable diffusion number here to 10 I add another 10 to that because of the textual inversion I'm already at 74. so this brings it pretty close after you've done this click here on Create and this will create that file for you the next step is you need to process the images for that you need a source directory and a destination directory so anywhere on your hard drive create a document for your project and then have an input folder where you have the original images and an output folder where stable diffusion can put the pro processed images in here for the input images you want to have a bunch of them like at least 15 you can also have 50 or 100 pictures but have enough pictures these pictures should be fairly similar but not too similar and also a little bit different but not too different in my case I created these images on mid-journey and I wanted to see if stable diffusion can do the same thing so I created a bunch of bunny pictures and hands down mid churny is doing a better job on that but seeing what stable diffusion can do is also very interesting also I reduced the size of the images to 512 by 768. the resolution is important because this not only defines the ratio of your image but also because if you have a high resolution image it will considerably make the training time longer to get the location of this folder up here you have this bar similar to your browser click in there and you can see this is already highlighted in blue if it isn't simply click and drag your mouse over it to select everything and then go Ctrl C to copy that Ctrl V to paste it in here for the source directory and then also in here for the destination directory but you want to rename this to Output of course because this is going to be your output folder then this is really important you need to set the size here correctly so in my case 512 for the width and then 768 for the height now down here you have several options you can choose from create flipped copies what this does is for every image it will flip it to the other side so you have basically double the amount of pictures this you can ignore and over here you have use blip caption as file name with this does is it looks at every image and then creates a file name that has a description of that image in it so make a check mark on this the next one you can ignore and the third one make a check mark on that too click your own process so all of these images from the input folder are processed into the output folder then select the address of the output folder go down here where it says data set directory and then post this in here in the embedding you simply select what you have created before as the name you can leave the learning rate as it is and also the log directory as it is next we have the prompt template file and stable diffusion offers several of these files in here in the textual inversion template so when you open this up you have text files you can see it's for Style Style with file words subject sub check with file words when you double click on that you can see here a list of prompts these are the prompts the AI is using to train your images the name is whatever you typed in here into the name field at the start of that formula and the file words is whatever the AI has put here as an image description in the file name so in this case because I'm training a sub check and I want to use the file words here you write style underline file words dot text and of course this address is the same address and you can also experiment with writing your own prompts in here although I used the template file for my project next you are again asked for the resolution and this is really important you want to set this resolution to the same resolution as up here because otherwise you're getting skewed results they are not going to look good so set this also in my case to 512 by 768. next we have the max steps and this takes a considerable time I have a pretty beefy computer with a 3080 TI and 20 000 steps took me two and a half hours so what you can do here and this is really good set it up for maybe 10 000 or 20 000 steps I think at twenty thousand steps you already get something rather good and then afterwards if you're not happy with that result after training is finished you can come back to this page to train it up to 30 000 for forty thousand fifty thousand whatever you want if you want to do this you can ignore this up here because we have already done that but you need to select the name here from the embedding and then set up the other settings again the same like with the directory of the data set with the prompt template file with the resolution and then you simply go here set this to 40 000 for example and click here on train and this will start where you left off the last time but make sure you're setting everything the same thing so maybe before you start your first training make a screenshot of that page so you know the settings you took last time this one I think you can ignore the next one will create a sample image every amount of steps and of course you're probably very curious so I set mine to 150 so this means every 150 steps this is going to render a sample image now this will slow down the process because the training has to stop an image has to be rendered and then the training progresses afterwards so for the first time I would maybe set the maximum steps to let's say 20 000 and then these steps here to 150 so you see what you're getting and then if you like the results afterward you can come back here set this to let's say 30 or 40 000 and this up here to 500 or even a thousand because you don't really need to see all these kind of sample images another thing that is really useful is down here to save a copy of the embedding and what this means is that while the embedding the textual inversion is training this will make a backup copy of that textual inversion training in this case every 500 steps now this is not only very helpful as a backup so you don't lose your progress but the cool thing is also you can afterwards use this to test the different versions because sometimes you get a better result with a less trained file so leave that at 500 I find that's a pretty good value then after everything is set up simply click on train here and wait until it's finished I would suggest that once in a while you check in here to see what this kind of sample image is showing you if the results are completely unrecognizable and completely strange there might be something wrong with your input images so try to find other images or try to edit them in a different way so after this has finished training where are the files how do you use them in your stable effusion folder you will have a textual inversion folder when you click on this you see here different date for this this is the date where you trained the model if you stop the training and then go on with the training on another day the new results will be in the new day folder of course so when I double click on here and then I have here my project name double click on that too and then we have the embeddings and here we have the test images to show me how good the process is running in the embeddings you have all of these files these are the backup files that have been created now when you go back to your stable diffusion folder you also have the embeddings folder and this is where you will find that file with the project name now for me I have several versions in here I renamed them depending on the amount of steps that have been used for that specific file so it's no problem to rename that file afterwards and then you can call it with the file name minus the dot PT so let's go here to text to image you can also use it on image to image by the way enter your prompt in here and then at the end I found it most useful to write comma by and then the name of your project so in this case I'm using chap bun 2 minus 40 which is because this is the file with the 40 000 steps now let's look at some examples and here's something that might come as a shock because here we have the Leica version and it looks really good and then we have the 40 000 steps version and I actually like the original version better now when we go on with this to other examples you will see that most often in the versions that are using a textual inversion you have the right kind of clothing as seen with my mid-journey pictures the right kind of position for the bunny and also a very nice detailed picture while what stable diffusion often creates is just a bunny sitting next to a candle or just a bunny holding a candle this with the 40 000 actually pretty nice but when you look here the 20 000 in this case is in my opinion actually the clear winner even the 10 000 produces a really good result here is one that looks really amazing and this is something that kind of angers me about stable effusion because sometimes it is amazingly beautiful but it's all dependent on the seat if the seat doesn't fit your prompt you get bump kiss and then sometimes the seat and the prompt come together and make this kind of magic baby of a IR that just looks really nice another cool thing I want to show you is that actually after this has finished the training you can actually also use other animals that look similar to bunnies with their fluffiness like for example cats or lions pandas not so much but hamsters can give you some adorable results so you can experiment with this and have a lot of fun and one of the most important things about textual inversion is that this opens up to you the AI for experimentation and creating your own Styles even if this has its flaws it's still a really beautiful technique and brings back the artistic element into AI art leave a like if you enjoyed this video thanks for watching and see you in my live stream tomorrow bye oh you're still here so uh This is the End screen there's other stuff you can watch like this or that's really cool and yeah I hope I see you soon uh leave a like if you haven't yet and well um yeah I wish you good weekend

Info

Channel: Olivio Sarikas

Views: 128,887

Rating: undefined out of 5

Keywords: affinity photo layers, affinity photo adjustment layers, affinity photo learning, oliviosarikas, olivio sarikas, olivio affinity, olivio tutorials, stable diffusion, stable diffusion tutorial, stable diffusion ai, stable diffusion ai art, stable diffusion guide, stable diffusion gui, textual inversion, ai art, ai, dreambooth, midjourney, midjourney ai, dall e 2, dalle

Id: MLz0iM0M7Fk

Channel Id: undefined

Length: 16min 20sec (980 seconds)

Published: Sun Oct 16 2022