New Image2Video. Stable Video Diffusion 1.1 Tutorial.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
image to video has been around for some time with paa Runway stable video diffusion but now stability AI dropped their stable video diffusion 1.1 which is fine- tune of their previous 1.0 model well obviously you input an image and you get video results is it better than the previous one that's what we're going to find out I'm also going to show you how to get it Runing oh and remember guys if you want to help me out check out the patreon links below that is my main source of income and how I create these videos if you're feeling generous thank you I also got some extra files and stuff on there that's not available on YouTube oh and did you know that the word incorrectly is spelled incorrectly in the dictionary AI I'm going to be showing this in comy it's not available in automatic 1111 it is available in a fork of automatic 1111 I'll show you that in a bit but but first off let me just show you the comy stuff so we're going to have an a note here the work flow is going to be available in the description we're going to input an image which is this one just a random stable diffusion image and it's going to run through these nodes in a k sampler and we're going to get an output that looks well something like this and we're also going to check a comparison between the new model and the old model that is this one so is a comparison here we're going to do it for a couple of images first let me show you how to get it running so again all the links are going to be in the description below this is hugging f for stability AI stable video diffusion image to vid XT 1.1 and there's not a lot of stuff you need to know but you need to know this and that is that the this model was trained to generate 25 frames at a resolution 1024 by 576 so that's what we're going to use you also need to know this well you don't need to know this because it's at default but if you start filling with the settings it's good to know that fine shooting was performed with fixed conditioning at 6 frames per second and the motion bucket ID of 127 and that is not the frames per second that you can set here for the video combined that is the frames per second that you can set here in this node here so this value here the six here and the motion bucket ID 127 here these are going to be set default of this value so don't mess about with them it's just going to break your stabil diffusion unless you want to you know go wild test a lot of stuff but uh this frame rate change it however you want you're going to go into F and versions here and you're going to find this one SVD xd11 save tensors you're going to click the little download button here if you're using comy you're going to go into your comfy UI models checkpoints drop that straight in there if you are using an automatic 1111 Fork like I'm going to show you in a second you're going to go into stable Fusion web UI models stable diffusion so that's uh the main difference there then you're going to take the workflow from my description you're going to drop that into your comy you're going to see half of it you're not going to see these these ones or this one CU I've just cloned basically this workflow and I did another one for for a comparison so this is what you're going to see and you're going to select them all SVD XD I renamed it actually to 25 frames and I said the size 1024x 576 so rename it to whatever you need drop an image in in this case I drop this in press Q prompt and press to change your you get magic now as I've already prepared this one here we can actually see the difference so I'm going to color this node for you so the blue one here that is the new one and this thing of a jig the right one the red one that is the old stable video diffusion model and as you can see in this example here I would say that new the new one is actually better especially if you look at the car here and the tail lights it is much much better and it's keeping consistency as the camera moves forward as you can see it's not a zoom it's actually a movement forward of the camera uh the characters and the people here not great but that's the case in both of the models but if you look at the brake lights on the right one it is not fantastic and the characters the neon light so the sign up here to the right it's kind of mushy warping and in the new model here on the blue node it looks much more consistent and and it stays in shape first example now we're going to check the automatic 1111 Fork the way you can run St video diffusion if you hate comfy but you want to run this I haven't tested it yet so that that might be up to you guys uh but I've been told it works Forge is a fork of automatic 111 so if you scroll down and look at this you can see well it looks basically exactly the same that's just a tip check it out this is not the guide for that that's probably you know in a later video and we're going to head back in here we're going to check some other comparisons I'm going to load up another image I'm going to take let's take this uh hamburger here beautiful burger and I'm getting hungry just looking at it I'm going to have some second dinner after this uh you wouldn't notice it um since I'm such a lean guy but uh well for some reason it works so the blue one is the new model the red one is the old model we actually have a movement here of the back ground on the left one and the static hamburger on well the rest of the image actually the old model here is performing better because we're getting some sort of a rotation here of the burger we're actually looks pretty fantastic I would say this is amazing and the F french fries here in the background they are staying consistent everything I would say in this one stays very very consistent there's a little in the background here so there's some extra movement the lights uh are kind of Disappearing there uh but just the burger and what's on the table we actually getting something appearing here behind um there's some slight warping on the salad there to the right but apart from that well this is the first uh case what I've seen where the old model performs better I'm sure this will be the case moving forward let's check the next one so I have this image here on of a floating Market it's a painted style so I'm going to drop that in and this is going to be a hard image I think to um to work with because in this case uh we have characters or people um and they're not going to look very realistic once uh this starts working but we'll see so the image has finished now and we can see in both of the scenes that uh all little bolts here are keeping f fairly consistent this uh character here is kind of being warped out I don't think it understands it's a person and it's uh not great in either of the scenes uh I would say this is probably a high just looking at the left part here I would say these lamps here are actually keeping fairly consistent as it's a slower zoom on the left one and something I noticed instead with video diffusion 1.1 is that the zooms or the movements are a little bit slower which helps in keeping consistency and you can also see that the background over here far far part here is is moving less whereas is moving more in in the the old one so I say okay I started with a tie I would say that the new one slightly win in this one look at this comparison here you can see what I talked about uh in regards to stable video diffusion 1.1 having slow lower movement you can see on the left one here moving much much slower and the right one is moving much quicker so it's easier for the the the new one to keep consistency now both of these are actually looking pretty good but you can see more in the details of the stone in the left one here is is actually pretty decent while the right one here is well it's a little more blurry so in general a little better on this example here going to check uh this image here of a cherry blossom tree in the garden okay so here we have quite a clear winner uh the left one the new one is keeping the scene consistent so it's not perfect by any means however the old one it's some sort of a jiggly movement going on there so not great at all I'm going to show you this rocket launch up next just dropping that in and dropping that in well for this scene in particular there there's some some stuff going on right so we have the rocket rocket ship and there's the blast or the the smoke coming on out from the rocket we also have the stars in the sky here so this is a hard picture I think the rest of the scene here is going to be fine but uh let's see what stable video diffusion does with the Stars oh and while you're here let me remind you about my Discord where we have 7,000 enthusiastic AI art and generative AI people uh we also have a weekly AI art challenge this week is Cy Punk Adventures here are some of the submissions so we have a bar District here future Billboards the loone figure looks in tastic I'm going to press a star on that I like the dystopian Burgers here that's nice come in and take part in the weekly challenge so our image here has completed uh the rocket ship turned out okay the blast stuff in the smoke here is well it's consistent at least and the right one here it's also fairly good uh the stars are messed up in both pictures which is sad because I did the test earlier where stable video diffusion 1.1 performed better so they kept the Stars consistent it didn't in my test this time so sorry about that but in most of the test it has been more consistent anyway I'm not going to bore you with too many of these comparisons what I've noted so far is the stable video diffusion 1.1 is a little better except in the burger case but I don't think you need to use the older model all in most cases they believe diffusion 1.1 performs better just use a different seed get a new generation if it doesn't perform as expected I hope you learned something today as always have a good one oh and uh like And subscribe guys like subscribe
Info
Channel: Sebastian Kamph
Views: 13,492
Rating: undefined out of 5
Keywords:
Id: ue1qHBvlurA
Channel Id: undefined
Length: 10min 50sec (650 seconds)
Published: Tue Feb 13 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.