The Future of AI Video Has Arrived! (Stable Diffusion Video Tutorial/Walkthrough)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hey everyone so a new AI video model has just dropped from stability. a and it is pretty fantastic and look I know a lot of you hear stable defusion and you start thinking it's going to involve a complicated workflow or you're going to need a powerful GPU to run it but don't worry I got you covered today we're going to take a look at stable diffusion video Plus show you a number of different ways to run it like even if you're on a Chromebook plus we're going to circle back to final frame a tool that I have discussed here in the past but there have been some updates and there's a road map to some pretty cool features that are coming up in the future okay let's dive in so stable diffusion video is image to video currently at least text to video is coming but at the time of this video it has not been released yet the model is described by stability as being trained to generate short video clips from image conditioning uh it's trained to generate 25 frames at a resolution of 576 by 1024 there is another fine-tune model that runs at 14 frames but overall all yeah I know what you're thinking like 25 frames what am I going to do with that well there are some tricks with that we're going to take a look at that in 1 second but the thing is that those 25 frames can actually be pretty stunning uh let's take a look at this example from Steve Mills the Fidelity and quality of these videos look pretty fantastic you will notice that they run a little bit longer somewhere in the neighborhood of 2 to 3 seconds I'll explain how that's done in just a few minutes but yeah overall it looks really really really good I will give you that the car is a little bit on the wonky side but maybe it was an IC day now to be fair Steve's outputs were upscaled and interpolated by topaz. if you're wondering how much of an effect that has Max has actually provided us with a nice side-by-side comparison the top version is obviously the straight out of the boox stable diff Fusion while the bottom is stable diffusion gone through topaz just as a quick sidebar you are looking at compressed video that has been recompressed onto YouTube so this probably isn't the best thing to judge on that said I do think think that you can probably see a noticeable difference between the two um if you can't afford topaz which I totally understand I do have a few suggestions on how to upscale and interpolate that aren't quite as pricey uh we'll get to those in just a little bit an new aosh put together a really cool side-by-side comparison of stable diffusion video to the other various imageo video Platforms in terms of action and motion on the left- hand side I thought all three did a pretty serviceable job uh Runway did lose that candle in the middle although at the beginning of it it almost kind of looks like the kid is blowing the cake out uh but that fork or whatever that is on this side just does kind of vanish I do appreciate on the rightand video that Runway kind of changed this kid's expression as he's making this mess into you're like uhoh I'm in trouble here now one other downside is the fact that we don't actually have camera controls yet in stable diffusion video although those will be coming soon via custom luras that said we do have controls for the overall level of motion once again Anu aosh puts together this comparison showcasing various levels of motion control as we can see here motion 50 kind of has that level of speed that I think we've all kind of gotten used to with image to video um whereas motion 180 and motion 255 uh that has a lot more Dynamics and speed to it it kind of reminds me a little bit of early Pika uh back when that was really really crazy wild but far more temporally coherent and despite the shorter clip length I do think that if you lean into that you can get some really creative results as pers did here by taking a number of stock and creative common images animating those and creating a collage the speed of everything kind of creates a really nice kind of commercial feel to it now one big thing with stable diffusion video is its understanding of 3D space which is probably why we'll see more coherent faces and characters as this model moves forward on a very basic level kaai Zang was able to illustrate this by uploading uh you know three images of that sunflower feeding that into stable diffusion video and now we have you know a 360 turnaround a more practical look is this example image that stability had posted kind of a I don't know sort of like zelda-esque inspired image um but you can see these are three separate shots so a pan to the right a pan to the left um a static shot I guess and then a pan up and down but you can see that overall the environment stays very consistent despite these being three separate shots or four separate shots I guess now in terms of using stable diffusion video you do have a number of options if you want to run it locally probably the easiest thing to do is to use Pinocchio now on the one hand if you're not already familiar with comfy UI you will need to spend some time exploring that workflow if you decide to go this route but the plus side to use using Pinocchio is that it is oneclick installing literally once you have Pinocchio downloaded all you'll need to do is open it up and come over to stable video diffusion click on that and then hit download now this does only currently support Nvidia gpus so if you're on the Mac side you're kind of out of luck for now that said I believe that there should be a Mac version sometime next week now you can also try it out for free on hugging face um it is the stable video diffusion space down here simply click on that from here simply upload an image um and then once that's uploaded hit the generate button now the thing is that since this is on hugging face um chances are you will probably run into too many user errors unless you're you know on completely off peak time um so another option for you you can head over to replicate which is kind of my non-local goto for stable diffusion video via replicate you can run a number of generations for free at some point they will ask you to pay though the price is actually pretty reasonable I think it's about 7 cents per output just to run you through replicate really quickly once you have your image uploaded you'll have the choice between using either the 14 frame stable diffusion video or the 25 frame stable diffusion video you then have a few Selections in terms of your aspect ratio either maintaining it cropping to 169 or to use the image Dimensions frames per second is where you can play with the output length of your video I know that's not a ton to play with but um you know obviously if you crank it up to 24 or above you're looking at a 1se second video uh as you start moving the frame rate down to say 12 you can buy a 2C video um and then lower than that for a three or possibly even 4 second video motion bucket would be the overall amount of motion in your outputed video obviously the higher you crank it the more motion you will see the lower you crank it the less motion you will we'll see conditional augmentation here controls the amount of noise that's added to your initial input image uh the more noise you add in at the beginning the more varied your results are going to be I would just leave that at the default of two which seems to have a lot of temporal consistency decoding T has to do with the amount of frames that are decoded at any one time it is set to 14 um but you know you can feel free to play around with it if you want try 21 try 7even see what happens and in terms of video upscaling and interpolation well as it turns out you actually don't need to leave replicate in order to do that uh for interpolation you can use R Video interpolation used this on the channel in the past it does work very well and there is also a video upscaler that will take your videos up to 2K or 4K uh links for both of these are down below coming up for stable diffusion video there are improvements that are already being made to the model such as text video 3D mapping and of course longer video outputs and look I know that shorter video length is not thrilling anyone uh just remember this is all just getting started uh that said if you do want to extend your video clips we do have a tool for that as well so finalrender the slate and show you how this all works uh Benjamin deer the creator of final frame has added in this AI image to video tab uh hit that we upload an image and then you know final frame begins processing it and obviously we now have that image in motion now here's where things get interesting is we can come up to this add video Tab and add in say something from Gen 2 and once that video has been processed you can see in the project preview tab that these two clips are now merged together now obviously from here you can add in more video or do more AI image to videos what's kind of cool is that you've got a timeline here where you can rearrange your Clips if you want to um I wouldn't necessarily call this editing per se it's more like arranging a string out uh once you're happy with everything you can just hit the export full timeline button uh and indeed you have everything strung together as one continuous file so a couple of quick things you'll note that there is a save project open project and new project button I have spoken to Benjamin about this those don't work yet so just as an FYI so if you do close your browser you will lose the work that you've put in so just be careful there that said those features will be coming along something to keep in mind is that final frame is the project of one person so you know let's let's give him some latitude here but these are the types of tools and projects that I really like to showcase on the channel things that are Indie made and you know developed by someone from from the community so if you have a moment head on over to finalrender is asking for suggestions and feedback to improve final frame so that's another crazy day of AI video advancements hey if you haven't had a chance please do hit the like And subscribe button I thank you for watching my name is Tim

Info

Channel: Theoretically Media

Views: 109,320

Rating: undefined out of 5

Keywords:

Id: 82l0DsbLHhY

Channel Id: undefined

Length: 10min 36sec (636 seconds)

Published: Tue Nov 28 2023