How to create great long Stable Diffusion videos with AnimateDiff and ComfyUI

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello and welcome we are the pixelated poets and AI Fusion band in our music videos we heavily utilize stable diffusion and animate div to create our music videos and especially we use longer scenes that last longer than the perceived limit of 3 seconds uh which you can see in our music videos if you are interested in seeing our stuff you can find us on YouTube at pixelated poets and our music is out on all streaming platforms as well so let's dive into this repository we've created a giup repository with a username pixl poets you'll find the url and this is the repository named Lumber animate diff movies and you will actually need the workflow of Json uh when you clone the reposit atory or you just copy over this Json you can download it and you can load it up in comy UI by pressing the load button over here uh and then you select the uh the Json file and it will replicate the entire workflow that we have used you will need a uh input image pose that we uh want to emulate but uh I will get into that in a bit at first we start with the checkpoints we are using a stable diffusion 1.5 model dring shaper 8 uh which creates amazing images uh and um actually and the animate diff uh recently released a beta for uh animate div for stable diffusion Exel uh but we're not using that that uh it's still in beta and the results are not that spectacular for the moment uh 1.5 gives better results and so in addition to the dream shaper we're using LCM Laura and that is basically to speed up the generation of latent images uh which works quite nicely and it doesn't affect the image quality now we have a prompt right here uh so our prompt is Grimes security being uplifted by strong winds and we have a negative Brun and this is very important when you're making YouTube videos it has to be uh safe for work you cannot have naked uh or nud Stu in at all because you will get reported and flagged uh which is what we don't want we're going to be generating 24 frames and the frames are being generated uh collector body case sampler uh and animated by animate diff which basically predicts which movements uh are made um while doing this the end result is a movie uh in this case I have selected M4 you can do this a give as well like we did in the repository um when you select MP4 the quality will be a bit better because uh there's more color reped in before than in a gifa um now I mentioned that there is a perceived limit that you can create movies of 3 seconds long and this is incorrect um but why people think that is because they do not have the repeat latent batch step uh when you don't have that after a second or three the um stable diffusion anima will lose context of what it was doing and it will wander off and uh hallucinate whatever it wants to hallucinate and uh you will not have a good video so this little trick uh adds a uh repeat latent batch so you create 24 um images latent images uh on which it can hallucinate uh and after generating those 24 late images it will duplicate them three times uh that does not mean that you get three times the same animation but you constrict what the model can hallucinate on um and therefore you restrict it to the boundaries uh creating a coherent hole overall um so of course this one has already been generated and I usually use a fixed seat so let's now change the seat and let's change the prompt uh so we want to generate a different kind of image uh but using the same pose and I forgot to explain this uh in the workflow I have also added um control mat with a DB uh DW pose estimation and what pose will do is look at image and draw a stick figure here what it thinks that the subjects of the picture uh is supposed like and with the the strength here it will enforce in the frames um the particular pose that it detected uh so I have used a pose of 0 upon 7 which gives the uh and result the end frames a little bit more flexibility uh if I would op this to one uh it would be way more static and uh you would have a lot less movement um so this is to give it a little bit more freedom uh the string ser. 7 we start at the beginning and we end at the end uh if I would do this ending at 0.5 there might be a little bit more movement later on um but this gives more predict of results and you can change this image for for any pose that you like uh um so what happens now when we cue this uh is that the uh the latent images will be generated and uh then replicated and then sent to uh collected by the F combine to create a coherent hole so now let's indeed uh change uh so the scirt is being uplifted I think that is still nice um my strong winds let's say there is a backdrop of galaxies and stars and I want her to have color with pink hair skir being uplifted it's perfect and then we go to the cupr uh which will Ren the all uh and this will take a while so I will skip ah hat to a result and there it start to be called the image and now we have a completely new image that corresponds with our prompt so Grimes with pink hair short being uh skir being uplifted by strong winds and a backdrop of galaxies and stars I do not completely see the Galaxy out there the star yes that's cool um so now we have 9 seconds of video more or less we have a thring rate of eight so we have nine seconds of coherent video and that's how you create uh longer context movies with animated diff you could do this without a pause of course you could just uh have a text prompt and uh do the same just remember to repeat delay and batches uh that is the trick here and so I hope you do enjoy this tutorial and that you may find it useful uh leave your thoughts in the comments and uh it would be cool if you could check out our YouTube channel we have a new song out that has a ridiculous is love amount of views uh so please do give us some love thank you

Info

Channel: Pixelated Poets

Views: 4,164

Rating: undefined out of 5

Keywords: stable diffusion, animatediff, ai music video, tutorial, ai video, svd, Stable Video Diffusion

Id: znXk662AgOs

Channel Id: undefined

Length: 9min 9sec (549 seconds)

Published: Wed Nov 29 2023