Will AnimateDiff v3 Give Stable Video Diffusion A Run For It's Money?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
big things are happening in the animate diff World they've recently released these new version 3 models which seem hotter than a dragon's breath after eating a chili burrito but the fun doesn't stop there because there are also a couple of long animate models from light tricks one of which was trained on up to 64 frames that's twice as long as the others but the burning question is how how do all of these new models actually hold up with anime diff version 3 comes the release of four new models a domain adapter a motion model and two sparse control encoders not sure what any of that jargon means well thankfully a picture paints a thousand words and they've given us some to look at the first one here is for the RGB image conditioning an RGB image is basically a normal picture so to speak which makes this this akin to stable video diffusion that's the model from stability AI which also lets you animate from a static image as you're probably aware though stable video diffusion is limited by a license which does not allow for commercial use unless you pay them a monthly fee for an educator like me such monthly fees don't make any sense at all piled on top of all the current expenses of putting out these video after all my ongoing Mission here is to empower nerds rather than to line my pockets if you're facing a similar dilemma then rejoice as the license here is Free as a bird with no pay walls to clip those creative Wings finally us nerds can animate images without breaking the bank version three doesn't just animate single static images though oh no because if you look at the scribble example you'll see not only do they convert a single scribble into an animation but they also do it using multiple scribbles three in this case so it looks like you're able to guide your animation based on these multiple inputs a sort of sweeping zoom in this case unfortunately I haven't yet found a way to use these sparse controls outside of their implementation but I'm sure it will only be a matter of time before something is available for either automatic 1111 or come for UI uh knowing my luck more than likely that will be about 10 minutes after I release this video anyway the Laura and the motion module files are good to go in both automatic 1111 and comfy UI right now all you have to do is pop them into the usual place for whichever interface you prefer using I'll start out here by showing automatic 1111 then switch over to comfy to run the comparisons as you may be aware the problem we're trying to use automatic for video comparisons is that it's limited to just a single output whereas in comfy you can put everything side by side obviously you'll need to have the animate diff extension installed and the version I'm using here in automatic is from December 20th if you need more detailed instructions do see my previous anime diff video as well as the various GitHub Pages the GitHub page for the automatic animate diff extension also has this really handy link for some fp16 safe tensor files which also happen to work in comfy this type of file is great because not only are they safer to use but the file size is smaller too there you can see version three weighing in at just 837 Meg this helps to save both load time and valuable disc space okay let's get to prompting and testing as the version 3 Model comes with a Laura you'll need to enter whatever prompt you like but also select the Laura as well here I'm going to put in an absolutely amazing prompt and then to add the Laura you just click on the Laura tab over there I'm going to use the search here to limit it down to mm there we can see mm sd153 adapter we click on that and it adds the text into your prompt at the top there we can pop back to generation if we scroll down a bit we've got the diff panel here which you can expand now this is the version three module I'm selecting there and well almost everything else is good I'm going to change these save formats over to webp and just leave pretty much everything else on the defaults apart from remembering to select enable otherwise you'll just get a static image if you're looking for a full breakdown on what each of those options do then check out the GitHub page which has a really detailed write up you can get there easily by clicking where it says this link right there as the first thing in the panel okay so this is basically a default we're getting 16 frames of a rodent we'll generate and see what happens awesome that's done exactly what I asked for I've got a rodent riding a motorcycle there is only a limited amount of Animation there but obviously it's only 16 frames as you can see it's really easy to use version 3 in automatic but how does this compare to version too and what about those long animate ones well it's time to switch interface over to comfy in order to see all those models at once just for Giggles let's build this up from scratch as well we'll start with a standard model over here I've got these in node templates and I'm just going to load everywhere I like to stick all these things in groups as well so I can move them round and see what each bit does prompts are good too so let's add some of those in as we're going to do a comparison here I'm going to need four sets of animate diff groups group one will be the animate diff version two group two is going to be the new version three model and for group three and four I'll put in the different long animate ones that first one is absolutely fine on the defaults here the second one we're going to use the new version 3 Model now this also has a Laura as well if you remember from automatic so we may as well pop one of those Ino let's put it up here I'll select the version 3 adapter and then link that into the model there we go okay now the next one down here these are for the long animate ones and these do need slightly different options too I use the 32 frame one in that one and the 64 frame one in that one if we check on their GitHub page we'll see they suggest a motion scale of 1.2 28 for the 64 and 1.15 for the 32 right so those should now have the settings they suggest obviously we're going to need a case sampler for each of those so let's add four of those in little bit of interpolation too just to smooth things out right so that's got all four of those in and obviously we need to connect up the models as well we're almost done but we do need some latent let's pop that over here in with a Laura loader and to start off we'll do that batch size of 16 it's probably a good idea to use the same seed for each sampler in order to do a comparison so let's pop a seed in there and I'll also convert each of these uh convert noise seed to input there we go I'll just do that on all four Samplers I think that's all we're going to need apart from to update the prompt uh may as well use the same one as in automatic although actually for these long ones let's take the context options off as that limits it to 16 okay let's run that through and see what happens okay those look to have finished let's put them up side by side make it a little bit easier to see okay so we got the standard version two version three the 32 and the 64 context obviously the version two looks pretty good version three quite nice long animate ones also look sort of okay personally I kind of prefer these first two and obviously the generations will vary depending on what seed you put in so let's increase that context a little bit and change the seed we'll go for a batch size of 32 and pop the seed up let's run that through again and see what happens hopefully these long animate ones with that larger context will do a little bit better but we'll find out in time uh yes let's do that time warp all right so now we've got the results with 32 frames and I think they're all not too bad certainly long animate seems to have done a little bit better with that higher context uh my favorite is still the original version 2 the version three is of course primarily for sparse control but as you can see it works great just like the version 2 model the long animate one there with the 32 frames well I mean it seems okay and the 64 frame one again seems sort of okay now they are a little bit wibbly and that's something that you can help to control a little bit with an input video and control Nets things like that so let's do that let's pop in a video instead of these empty latents there's the video Let's just connect up the latent so instead of that empty one we're going to have that woman now there's no motorcycle in there so it's probably a good idea to update this prompt as well all right now I've got the video input let's run that through and see how those come out and with rendering complete for each of those I think you can see once again each of the models has given a slightly different output um I think I prefer version three version two is very nice I love the way her face turns into an actual rodent that's pretty good and the two long animate ones um I don't know I quite like the last one that's not too bad but your mileage of course may vary as mentioned before the main thing with the version 3 is the stuff we can't actually use yet the Spar controls but it still works very well with both text to image and image to image as well and once we do get those sparse control nets for version 3 I think that's going to be a real game changer as it's the festive season I'll also take this opportunity to wish you a good one and I reckon 2024 is going to bring us even more amazing geekery
Info
Channel: Nerdy Rodent
Views: 10,523
Rating: undefined out of 5
Keywords: animatediff, AnimateDiff, AnimateDiffv3, AnimateDiff v3, animatediff v3, A1111 AnimateDiff, ComfyUI AnimateDiff
Id: fxlRUTvD7O8
Channel Id: undefined
Length: 11min 32sec (692 seconds)
Published: Fri Dec 22 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.