Transform Your Videos into any LoRA Style with Stable Diffusion

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

today we are going to see together how to transform your videos with stable diffusion using Laura models in particular I'm going to use the Ghibli Studio Laura model from civit Ai and we are going to make this transformation without training a model so obviously training a model is better because he gets you better results better outputs but it's very slow right so what we want to do today we want to transform your videos quickly and easily and we can do that by using the image to image tab within stable diffusion before starting let's have a look at the requirements you need to follow for starting this tutorial you need to have stable diffusion as first thing this can be local on your computer it can be through collab or rampod or paper space whatever you want then you need to have a video to edit and for modifying it we need to split the video into image frames for doing that you will need a video to frame software and this can be free one like for example DaVinci Resolve or I found online list eatzatgift.com which is quite useful or if you have Photoshop you can use Photoshop as well but this would be on the payment you also need the frames to video software because at the end of the you know the prices will have to aggregate all of the images generated together and again in this case we can use DaVinci Resolve which is free and quite useful then within stable diffusion I'm going to use control net if you don't have control net you will need to download it right you can do that in the extension tab of stable diffusion this is the URL so what you can do you can just get the URL from GitHub you're going to install from URL you copy and paste the URL in here and then you click into install and it will be automatic now when you download controller you don't actually have the models so you just have the preprocessor so what you have to do you need to go into hugging face which is very usually this kind of models are so here and you will need to download the pth file and you can see here like for example for Kanye you will have to download this candy pth file you can just click on this little button next to it actually I can zoom in a little bit so this button here and for open pose as well you can download here for soft touch you have this one and I'm going to use that as well uh actually I'm going to use this one so you can download it from here why are we using control Network we're using control net for avoiding flickering flickering is when you have all of the image going like a little bit crazy yeah and we cannot remove it but we can reduce it so with Contour that we can control the position of your body of your hands of your face and create an output which is more coherent with the input another control net model I'm going to use is Media pipe you are not gonna find media pipe in this link but you will find it in this other Link in this case you will have to download about both the yaml file and the save 10 Source media Pub is quite useful because with that you can control instead the phase expression so if you're smiling it will output a smile exactly the same Smile as the input image once you downloaded these files you will need to move them from the download folder to the stable diffusion web UI so I'm going to show you where so we are in the stable diffusion web UI you go into extensions and then as the web UI control net and then here we have models and here's where you need to move all of your downloaded files so you have both the pth and the yaml file cool so once you have done this then another thing about controlnet is that I'm going to use four type of control Nets I'm going to use media pipe depth I'm going to use canny and soft Edge when you initialize stable diffusion for the first time you can use just one control method and if you want to use more you need to activate them how to do that you go into settings you scroll down until you see control net here and then you have this multi-controllenet Max models amount a new change this to be four or more depending on how many you want to use so once you chose your number you can apply settings reload UI and then if you go into image to image or also a text to image and you go down to the control next section which is in here you will see you have four tabs because I chose four right so four control net unit and for each control net tab you can choose a different control model we'll see this later last thing is face restoration so if we scroll down on this image to image tab you can see here restore faces now you can adjust this the setting for these restore faces again in settings you go into face restoration and then here you can choose whether to use code former or gfpgun code former you usually use it to adjust more realistic phases and gfp gun is usually used more for anime so that's why in this case I'm going to use gfb gun you can also choose not to use face Restoration in the first part of this image generation but you can use it when you upscale the image so if you didn't use the face restoration before you can use it after so here in extra so you see if you scroll down you have gfb gun visibility or code former visibility and you can choose a value between 0 to 1 you can use a combination of the two or you can use just one of the two is really up to you we are going to use a Lora model for applying our transformation to our video actually I'm going to show you the final result I've got compared to my original video so this one on the right is my original video and then here I have two different Transformations one with GF began applied before upscaling the image and this one is with gfp gun applied afterwards another difference is the denoising Strand here I'm using less the noise in strength meaning that the final video will be more similar to my original video so it's using less creativity while in this case in the middle one the denosis strength is slightly higher so this means that the final video will be a little bit more creative and as you can see it's probably more aligned to the Ghibli style rather than the more realistic original video okay let me play this one second it's just six seconds hello everyone today we are going we are going to make this okay so what you probably noticed that in the middle case which is this one the problem of flickering is a little bit higher and this is because of the denoting strand so higher is the creativity that we apply in our settings in stable diffusion the higher the flickering will be in the final output so it's really up to you of what you want to do obviously there are some techniques for reducing uh flickering and we'll see them towards the end of the video when we'll put everything together but it's obviously still an issue right now all of this for telling you what is that this approach I'm using it's for when you want to create a video which is similar to the original one so you see the modification I'm applying is actually just a style modification so if I wanted to modify myself and I wanted to I don't know change it to be a robot or an animal for example I could have not done it using image to image because flickering would have been like crazy right so in that case training a model would be the way to go okay set that we can start with our video so once you chose your video If you have a DaVinci Resolve you can upload it in here file import media and then I have my video here now you can drag and drop your video in here if you want then something really important I want to show you is the FPS the FPS is the frames per second and this is very important for when you split your video into frames so if we go into this first tab here you have your video here you see clip name is Laura and then you have the column here FPS and it's telling me that the FPS for this video is 30. it means that for each second the video is made of 30 images so when you are going to split your video keep this in mind so how many FPS is your video made of because this will be asked for generating the frames and it's important to keep this in mind because first of all the higher the number of frames we are using the slower will be stable diffusion for generating the images and that's fine right but also it's because when you then have all of your images and you are going to use them for creating your videos if you haven't chosen the same number of FPS as the video you may end up having a shorter or longer video now if you're using DaVinci you can split the video going into this last tab here and then obviously you can choose the file name and then the location using this browser and then here you have format and you can change that into Thief usually Tiff is for image you can choose the resolution and the frame rate so in this case in case of DaVinci you just add one option so that's fine which is 30 which is perfect once you chose all of them you can add to render queue I'm going to use desktop just to do something to show you how it works then you add to render queue and then run the row once this has finished you will see all of the thief images inside your the folder you chose so the only downside is that stable diffusion doesn't read Tiff images so if you decide to go for this word then you need to modify you need to transform your images from Tif to PNG or jpeg so just keep this in mind anyway my video is six seconds give that I'm extracting 30 images for each second it means that I'm going to have 30 multiply by 6 which is around 180 and here they are now you see that they're already in jpeg this is because I did it with Photoshop and I'm gonna show you how to do it now something really important is how to name these files they have to have this format right so they have to be name or even without name just the number but they have to be you know zero zero one zero zero two if you have until 999 because the unstable diffusion will take them in numerical order right so if you have for example one two three four five six seven eight nine ten eleven this is not gonna work because stable diffusion will take zero one ten eleven so what happened is that you will have all of your images shuffled then you don't know how to then put together your video if you have more than 999 images you should put than zero zero zero zero so four times zero now the other option I was telling you is this website here I uploaded my video already there are leashes that I can if you go into frame rate you can choose just 5 10 and 120 FPS and as I told you before this should be in reality equal to 30. now when you convert It This is Gonna Give You All JPEG files when you go at the end of this page you can easily download a zip file in here download frames as zip another way of cutting the the video is to use control net M2M this is free in stable diffusion it doesn't work for me for some weird reason it doesn't even give me an error like it just doesn't work but if you get it to work what you have to do you need to go into settings first you go into control net and then you need to tick this box which is allow other script to control this extension you need to apply settings and reload UI I can actually do it now then reload and then when you go down here in the script you can choose control net M2M and then here you can upload your video and choose the duration the duration I assume is the FPS but I'm not 100 sure this is still not clear I looked online everywhere and no one is actually able to to say anything about this duration but I assume is is this right and and then once you uploaded that you don't have to touch anything else really and you just need to press generate and this should automatically work the final way of doing it is via Photoshop this is not for free it's just if you have adopt photoshop on your computer you're going to file sorry this is in Italian you import and then here you will have video Levels photograms video levels or something like that and then you need to choose your video and then here you can choose whether to upload the video from the beginning to the end or you can just choose an interval and then you move this slide around to choose the interval actually this too my video is just six seconds so I'm going to do from the beginning to the end you press OK it's creating levels and then if you visualize the levels here you can see you have 183 levels so here you go all of them then to export them you go into file export levels in files or export levels or something like that and then you can yeah you can choose your folder again I'm going to for desktop you can choose the you know the first part of the file name and then the file type and then after that you click on submit here you have quality as well this is not that important like obviously stable diffusion is intelligent enough so you don't need to have a really full HD or very high resolution but it needs to be you know a good quality image obviously the higher the resolution the more time stable diffusion will uh take for making an image Okay cool so now that we have a folder with all of our images and they need to be named as we said before so zero zero zero zero one zero zero zero zero two and so on and so forth we finally initialize stable diffusion now I'm using this any Lora checkpoint this is a Lora model which is made for working with all Laura models and usually works very well for animal or models so I'm using this if you want to make something more realistic I would suggest you probably to use stable diffusion 1.5 the Laura model I'm using is Studio Ghibli which is this one it's a very good one it always work and I really like it I like Studio Ghibli so it's it's quite nice okay so now we're ready to start so the first thing to do is to go into this image to image Tab and we need to choose a good picture among all of the pictures we downloaded before it has to be a good picture where it's showing like the most important details like for example your hands and they don't have to be movements they need to be clear or maybe where is showing your teeth or the full body if you are doing a full body video and you upload it here now I can choose the number 165 I picked that before here you go then just to you know give a start instead of doing by yourself what I will do given that we want to apply the Ghibli style we can actually go into Civic Ai and we can grab the generation data for this image for example and we just copy and paste into our positive prompt then we click this button here below the generate button and then all of the settings will apply into our stable diffusion now obviously we need to make some modifications like for example here so we can remove arms behind the back base shoulders smile maybe we cannot talking something like that we can remove this Lora model because we are not using it and I don't even have it the negative prompt is usually fine and then yeah we can use crop and resize I want to I don't want to have a vertical shape but I want an horizontal one so 960. 540 then I'm going to keep this clip skip not sure if you know what this is is quite interesting and not many people are talking about it but clip is a model for generating text and image embeddings we transform the image into a vector of numbers this model is made of layers of computations and the deeper you go into the layer the more the vector generated will actually represent the concept within the image so let's make a very basic example if you have a picture of a dog and the dog is a French Bulldog the clip mother in the first layer will give you dog the deeper you go so let's say in the third layer or second layer you have the dog breed so in this case it will be bulldog and then the deeper you go so you will have a more specific duct type and in this case it will be French Bulldog so when we choose clip skip equal to two we are telling clip to stop not at the last layer but at the second to last layer and this is helpful when you're training a model and this is also helpful when you are generating images because you are using less computation power clip skip doesn't work for a stable diffusion 2.0 but works only with stable diffusion 1.5 and all the other models which are trained using clip so in this case I'm going to leave it this e and SD stands for ETA noise seeds Delta and usually these three one three three seven is used as a standard number and it's just additional noise added on top of this seeds number the seeds is this is the noise right this is relevant when you actually have a seat here like in this case but if I have a random number like -1 this is this means it's random so every time you generate an image it's gonna give you a different seed and so different noise is added to the image and you will have different outputs this doesn't really matter so we can remove it and then we have this control net section where we have our four control net units and we are going to apply our control net models what we have to do we are going to upload exactly the same image we use for the previous section in my case was number 165 this one then you are going to enable control net if you don't do that control it is not going to work and then I'm going to use in this first one media pipe both for the preprocessor and the model these settings are usually fine so I'm gonna keep them like that and here's something interesting is this control mode where you can use balance my prompt is more important and control that is more important this is new in control not like something that came out over the last one or two months there is a nice example here if you go down here you go so you can see that this is the original input and this is the output using the three different options so when you say my prompt is more important obviously here you can see that the output is more aligned to the style chosen if you use control net is more important it looks like you know it's giving more importance to control it obviously and to the original picture so the final result is more realistic and then balance is a middle way between this these two right so depending on what you want you can choose one or the others so in my case I'm going to use actually for media pipe I'm going to use controllet is more important just to you know to try and then you can change it and then you're going to do exactly the same for the others but you are not gonna use the same preprocessor and model obviously so I'm going to upload the image enable in this case I'm going to use that and here maybe I'm going to use my prompt is more important control net number two again upload the image enable I'm going to use soft touch my prompt is more important and then the last one enable I'm going to use Kane my prompt is more important okay once we set up this general you know settings actually I want also I'm going to add these restore faces as well I'm going to press generator let's see what happened and here you go you have this image generated and then you have this is Media pipe so you can see it's giving you the face and like the expression on the face here is that it's tracking actually you know the depth of your image so you have wider if it's closer and darker if it's farther this is soft touch so it defines you know the edges of the elements or subjects and this is candy which is similar to soft Edge but a bit stricter I'm not sure if I want to use candy actually but well I'll leave it for now so here you can see we are also applying the face restoration if we remove it let's see what happens foreign s are less precise but this will means that you will have more flickering we can increase the denoising strength if we want to have something more creative let's run like this and this is obviously more in line with the Ghibli style and yeah you need to you know play around with all of these settings to get what you actually want at the end so once you have a good image something that you actually like we are ready for uh processing the batch of images here you can see you have all the setting you use to generate the final image which is this one in my case and you have this seed which is very important so you need to copy this and paste in here and this will allow you to use the same noise for all of the pictures stable diffusion is going to process right so this also means less flickering if you don't use the same seed the seeds is gonna change for each iteration and you will get different results all the time then after that we go into batch we need to put the path to our input directory which is the images we downloaded before and in my case I'm going inside my folder right click properties and you copy this location I'm in Windows now but for Mac is exactly the same right so you this is my Mac you go into get info and then you have here the directory you need to copy and paste in the same location in this then given we want to use exactly the same images also for control not we don't need to put anything in here because it's gonna use exactly the same input directory and then we need to go inside control net here we need to remove all of the images and choose Bots like this once this is done you can press generator this will take a little bit of time it took me 30 around 30 minutes to run all of the images and again I had 180 pictures under 183 so depending on how many pictures you have and also the resolution you are applying it will take more or less time once this has finished you can click on this icon here on this button and then you will have outputs right and you have this folder image to image images unless you change the output directory obviously and if you open the folder you have all of the outputs obviously in this case I use different setting I'm going to share my settings with you on a Google Drive in case you want to use them but just you know every image is different so you probably need to use different settings I'm gonna share them like this so you will need to just you know copy them like this copy and then you go in here you paste like we did before for the Laura and you click in here then probably there are some setting you need to change but you can see them in here I'm gonna share everything with you in case you you need it okay now that we have all of our images generated in the anime style we can go into the extra stab and we can upscale our images we can do that using the gfb gun visibility if you're using the anime you want to restore the face and obviously you didn't use the restoring phase before or the code former if you are doing something more realistic so I'm gonna show you quickly for a single image you need to you go into your image to image file folder and then you can resize I'm going to resize it by two because I don't need to resize that much I'm going to use the r ESR gun anime 6 bit this is basically made for upscaling anime picture so it's quite good in this case I'm not going to use upscaler number two but it's really up to you you can do whatever you prefer so in this case I use gfb gun already in this picture but so for now I'm just going to generate this just to show you here you go this is the upscale picture which looks quite nice so once you're happy with that you can apply this upscale to all of the pictures right so you go into Bots from directory hot here your input directory which again if you click here you have outputs and it will be this image to image images and you have I have this is called with the date of today so I'm going to just doing like this copy my directory and paste it in here just make sure that inside this folder you have only the images you want to upscale because otherwise it's gonna take a little bit more you can choose a different output directory if you want and then you press generate this is gonna be super quick it will take probably two three minutes depending on your on the power of your computer cool so once this is done you will have this output extras images again if you haven't chosen another directory and you will have all of your 183 images in here yeah now I just generated this last three which I can remove so here you go they are 183 because they start from zero and we are actually now ready to put them all together to create our video I'm going back to my DaVinci Resolve because I'm going to use that I'm going to create new project create I'm going to upload the original video which is this one in reality you drag and drop in this timeline so I think it's uh the part I I was using was something like this so you can move this this is you know where you are on the in the timeline and then you can cut or using this like this new Clicker back like this you see it's cutting or you can use on the Mark I'm using common e then I can remove this I can do shift X oh I missed the last part so it's fine yeah so we have this one now how to create the video with our images so we are going to grab the all the you know the images we have from our outputs we would probably need to rename them like this like you know there there must be an index otherwise DaVinci is not going to recognize that it's a sequence of images then you select all of them and you can drag them and drop in the timeline like this easy and you have them here and that's it really so now obviously I cut this knot in the right way so maybe it's something like like this maybe I can move this a little bit to the right so I see and this as well to the left see where it starts opening the mouth so maybe it's like this hello everyone today we are going to make this yeah so this is how it should be so we can do that and cut here yeah okay so we can compare them now hello everyone today we are going to make this so once we've done this obviously you can see there is some flickering on this video on the right but it seems already like quite good right so if you are paying for a DaVinci Pro which is 300 dollars per 300 dollars for Lifetime you can go into here and then if you go into effects and you have if you type definitely the Flicker and you can put it here puff like this I don't have it so I'm gonna say no yet but it tells you how it's gonna be with the watermark on top which is quite useful so we can compare them actually two better compare them let me Crop oop crop this a little bit more and move it okay so and you can see this the flickering is working quite good right okay I'm gonna show you another way of doing it without paying obviously it's not gonna be the same I found this technique in this video Here video to anime this guy in this video is training the model though it's not doing image to image he's using this technique I found it quite good for reducing flickering in the video so let's remove this the flicker okay removed and what we can do we copy command C command V we copied the same images and we put them on top okay so this is gonna be exactly the same but we need to click on this one on the top one we are going to change the opacity to something like 15. and we are going to change this composite mode to color or darker color maybe okay and then what we have to do as well we need to right click on this time code here and we choose Source frame then let's go back at the beginning we zoom in and we move this by just one frame like this drag and drop to the right like you see it okay then we go back we zoom out a little bit not too much and then we see the result hello everyone today we are going to make this so the flickering is still there obviously it's very difficult to get rid of it but the like it seems way better than before right so if we remove this you see let's add this again it seems way better and that's it for today I hope this was useful and you enjoyed and let me know if you have any comments or suggestions always happy to listen and yeah see you at the next video bye

Info

Channel: Laura Carnevali

Views: 23,071

Rating: undefined out of 5

Keywords: stable diffusion, stable diffusion v2, diffusion, ai art, diffusion model, generative ai, generative art, openai dalle, stability ai, ai artist, imagen, nft, apple silicon, m1, stable diffusion on m1, stable diffusion with python, stable diffusion hugging face, stable diffusion github, stable diffusion v1.5, stable diffusion tutorial, lora, lora models, stable diffusion lora, civitai, hugging face, civitai lora, hf lora, use lora, lora sd, image to cartoon, studio ghibli

Id: 0Eg-ArDwFxU

Channel Id: undefined

Length: 35min 12sec (2112 seconds)

Published: Sat May 20 2023