ComfyUI: Stable Video Diffusion Clip Extension (Tutorial)

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi Seth here welcome to the channel I used this image in stable video diffusion and got a forward moving Motion in the next step I slowed the motion and animated the Aurora I stopped the forward motion entirely for the final step keeping the Aurora animated this was the result I used the same technique in another image and reverse first looped it the output was a 16-second video you can do all this in a single workflow using only stable video diffusion let me show you how this is an advanced tutorial and an extension of my colleague's basic tutorial for stable video diffusion I will do everything from scratch and explain in detail as much as possible however the correlations of each setting are better explained in that video like always all the workflows including the video outputs will be available to all channel members a big shout out to them for their support the clip extension method will work even after stability AI releases its planned motion Lura for stable video diffusion the horizontal vertical and zoom motion Laura models will give you complete control of the camera Direction and zoom first you need to do some installation for the frame interpolation comfy node before starting comfy go to this website and download the Nvidia Cuda 11.8 executable file this is not the latest version of Cuda but as of the date of recording this video the frame interpolation node does not work with any Cuda version above 11.8 after downloading run the file if you are on the latest version of the Nvidia driver do not select Express choose custom here you can check your current and the version that comes with this install if your current version is higher untick driver components do the same for physx make sure everything is ticked under Cuda and proceed with the installation after installing Cuda click on start and search for the command prompt put in the path of the Python embedded folder located in the comfy UI root folder all the install commands will be posted in the description we need to First install koopy for Cuda 11 I have it installed already so it's giving me a requirement already satisfied message now paste the following command this will download the q1 or Library required for cpy after it finishes close everything and start comfy go to the install custom nodes and comfy manager install the frame interpolation custom node then go on to install image resize and the video helper suit the basics are fully covered in the previous video I will leave the link to that video in the description I will try to explain again whatever I can within the video time frame start by adding the image only checkpoint loader next add the video linear CFG guidance node ensure that the minimum minimum CFG here is set to one add a simple Cas sampler I am going to add a preview note here all the vae geode image outputs will have a preview node for the tutorial we'll explain in a bit why it is needed the VHS video combine output node is way more powerful than the default save animated webp the Image Group will comprise three nodes which are the load image image resize and preview for the image resize there would be two nodes you have to add the one without any space between the two words now connect all the nodes the action would always be on crop to ratio if the image has a different ratio than 16 is to n you can adjust adust the crop by fine-tuning the cropped pad position value the cropped image should not exceed 1024 the video diffusion model does not support anything higher the model is optimized for 16 to9 or9 is to 16 ratios keep the image node disconnected from the main workflow to adjust cropping otherwise the case sampler will start processing the video remember to connect the image resize instead of the load image node with the SVD conditioning this will be important for this workflow as we will have multiple image groups connected to multiple conditioning nodes different seeds will give you different results since I am extending the video using a single image and some outputs I would want some some level of consistency and control for such cases I prefer to keep the same seed in other instances where you want the next scene to be very different you can experiment by changing the seed changing the seed also helps when you are not getting the near- desired results often there are glitches or imperfections in the output changing the seed helps a lot here the steps also make quite a difference in the output however it's ADV iable to be careful here I would stick to around the range of 20 to 30 steps only I want to try to animate only the flame for the first clip that is not entirely possible but you can control it to a certain extent for this workflow I would mostly be sticking to DPM Plus+ 2 MSD de and Caris this is primarily because of the control motion and Clarity to better understand the role of the different schedulers and Samplers keep all the settings the same and try different combinations of them another thing that could not be covered in the previous video is that a higher CFG will result in oversaturated in frames since I am extending the video with a single image I want to keep the CFG value as close to one as possible this also depends on how many end frames you will be using I will generate four Clips combine them and use one end frame from the second clip to generate the third and fourth Clips the longer I keep doing this the saturation will keep increasing suppose you use this clip extension technique for completely different images and Clips in that case the CFG can go as high as 3 .5 to 4.5 without oversaturation depending on the image let's generate the first clip there is minimal movement in the environment this is close to what I wanted with the preview image node we can actually see how each frame is combined there are 25 images here because the video frames value in the SVD conditioning node is 25 the end frame I referred to earlier would be the 25th image frame interpolation is an AI technique to insert frames between the given images will be using the GM FSS Fortuna node here I had no issues with it and the result was consistent the multiplier is the amount of frames that get added in between the frames for example a 2X multiple brings the total number of frames from 100 to 199 on the other hand an 8X multiplier generates about 793 frames to get the approximate 8-second video you can increase the fps to around 90 frame frame interpolation is vram extensive I did not change this value what it does is clears the cache after every 10 frames I had no issues going up to 8X since this would generate 199 frames I am increasing the output frame rate to 24 this will give me approximately an 8sec video output you can see the video is way way smoother this clip extension method applies to two scenarios you can generate and Stitch multiple scenes in a single workflow or you can use a single image to generate the output the ladder is more complicated hence I will focus on generating an extended clip from a single image source for the tutorial I will duplicate the selected nodes and use the same image to generate a different output the first clip only had the flame moving with minimal Motion in the environment the output I am generating is not for Aesthetics and it is random basically I am trying to test and show the robustness and limitations of this method I am using the same image resize node for stitching different scenes duplicate all nodes since I want more motion I will only change the motion bucket steps and CFG values first I will test the second out output you will not always get what you like in the first shot play around with the values in the conditioning and case sampler nodes to get close to the desired results just a note I have picked my result for the tutorial beforehand a few tries are given for video diffusion at the current state of the released model the max I went was about 10 tries for a clip there should be three outputs in total one for each clip and the last would be the combined result to combine the vae decode outputs search for a node called image batch the first vae decode output will connect to image one input and the second to image 2 input this node combines the 25 frames from each output in order it then passes these to the final VHS combined node to stitch all 50 frames this is how you combine multiple clips generated with stable video diffusion duplicate and connect all nodes again but this time I will use a different input image for this to work in a fluid motion where the second clip ends the third clip should begin without much control this takes some tries but it is very much possible if you know your way around the stable video settings the most important thing is that the last frame the 25th frame should be a clear image the image Clarity should be as good as the source image if the frame has a motion blur it will get awful from there this is an obvious limitation when making an extended clip from Just One Source image I wanted to do all this and see if it was possible in comfy UI without using any third party tool and yeah it is possible and I generated about five to six clips around 20 to 30 seconds each before deciding to make the tutorial there is a custom node that automatically takes the last image from the second clip and passes it on however the custom node requires all connected case Samplers to process every time you hit the Q prompt even if no changes are made so here I prefer the manual way right click save the 25th image and use that as a second image source here I am adjusting the settings to generate an output similar to the first clip the output from this seat is really cool check it out will revert it because I cannot reignite the flame in any way it just doesn't happen this is good enough to merge the three Clips duplicate the image batch note again connect the first batch output to image 1 input and the third vae decode will connect to image 2 input this is the output when all three Clips are combined you can keep repeating this process as much as you want for different scenes it is just a matter of duplicating the nodes and repeating them to reduce the load on your system process only 1K sampler at a time and hide the previews from the VHS combine node keep a maximum of two or three previews at a time also if you see this error for the frame interpolation node just close everything and restart it should work fine this workflow is a bit complicated all nodes are disconnected except for the first case sampler we'll explain as I go along since this image has a path it's very common to get the forward moving motion stable video diffusion is trained to understand the image and predict the motion flow that is why it's easy to get these workflows even with no motion luras okay what I want to do is slow down the motion drastically I will take the last frame of this clip and use it to generate the second clip I am changing the sampler and scheduler as I did not get the desired results perfect let's combine the two clips for the next part I will only animate the Aurora while the ground continues in a fluid forward motion for this part I will use the image crop node which was added in the latest update of comfy the last frame from the second clip will be cropped to animate only the Aurora oops ensure that the height and width match the crop resolution with the same last frame I will use the 4th K sampler for the continuous forward motion now I will mix both of them using latent composition search and add two nodes for condition combine add the latent composite node duplicate the case sampler vae and the preview image nodes the first conditioning node is the positive and the second is the negative connect both the cas Samplers with the condition combined accordingly now connect this conditioning node to the fifth Cas sampler the Aurora clip Cas sampler output will connect to the samples from in the latent composite node and the fourth Cas sampler will connect to the samples too in this case the feather value would be the same as the output height this will blend it nicely when mixed I find it best to keep the denoising value low as we want the output to be very similar to the two input clips beautiful for the last 25 frames I want everything to be still except the Aurora the last frame from this case sampler should basically be a still with only the Aurora animation however it is not possible to put a 25 frame clip over a single frame static image so I would take this last frame pass it through a k sampler with zero motion and generate 25 static images then then using the same process of latent composition I would blend both outputs and generate the final 25 frames the output is correct this should be no motion now I will duplicate all the Laten composite nodes and connect the Aurora and the last Cas sampler all the other settings will be the same perfect blending connect the previous vae output to the second batch image input and the last output to the third batch image node and that's about it it here is the final output to summarize I took a single image and generated a forward motion clip through the first case sampler the last frame from here was taken as input to generate the second clip which was still moving forward but very slowly then I took the last frame from this output and used it as inputs here an inbuilt cropping tool was used to crop the top part of the image which was then used only to animate the Aurora this input was then used to generate another Forward Motion clip from the previous frame both these outputs were then mixed via a Laten composite method the last frame from this output was used to generate a static 25 frame clip the Aurora and this static clip were Blended again via the latent composite to Output the final 25 frames the first two outputs combined to create batch one batch one combines with first latent composite output finally batch 2 is combined with the second latent composite output all three batches connect to the frame interpolation node which gives out the final output hope the tutorial was helpful and until next [Music] time
Info
Channel: Control+Alt+AI
Views: 3,774
Rating: undefined out of 5
Keywords: stable video diffusion, stable diffusion video generation, stable diffusion video tutorial, stable diffusion long video, ai video generator, stable diffusion video, stability ai stable video diffusion, stable video diffusion free, image to video ai, ai video editing, stable video diffusion clip extension, turn image to video, stable video tutorial, stable diffusion video consistency, how to use stable diffusion video, comfyui image to video, comfyui img2video
Id: rfh5Ur1UupU
Channel Id: undefined
Length: 27min 50sec (1670 seconds)
Published: Sun Dec 24 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.