Stable Diffusion 3 And Stable Video 1.1 Updates - What Will Be Next From Stability AI?

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello lots of new stuff is happening from stable diffusions and stable video diffusions now courtesy of stability AI they have recently released a plethora of new models last week we saw the introduction of stable Cascade and now they have just announced stable diffusions 3 let me share some samples of their upcoming sd3 models and as you can see in this demonstration from stability AI it's really cool you can seem LLY switch objects backgrounds and even change forms for example you can transform a window into a fish tank swap food on a table and even turn a raccoon into a cat the best part is that the Styles remain consistent even in animations that last just a few seconds the objects maintain their form throughout it's fascinating to witness these new updates with stable diffusions 3 I've already submitted the forms to be on the waiting list for early beta testing I'm really excited to see what it's capable of and if there are any new improvements or features we can utilize in automatic 111 or comfy UI now let's talk about testing the other video diffusion model from stability AI which is stable video diffusions 1.1 but before diving into the testing phase I recommend checking out Twitter or searching the hash sd3 there you'll find a lot of early testers who are already on the list for testing sd3 they have generated some incredibly cool images from what I've seen so far the image quality from sd3 is really impressive the text in the images like the robot Banner can actually be recognized previously we tested stable Cascade which could also handle text on images but sometimes it wasn't entirely accurate I believe stable Cascade served as a stepping stone for stable diffusions 3 with the launch of sd3 I expect the text recognition to be very stable and accurate just like in the example of the shirt with the sd3 logo on Top This opens up exciting possibilities for e-commerce businesses especially those in the print on demand industry they can use sd3 to design their t-shirts sweaters and other merchandise by printing logos or any desired text on them it's truly a game changer I'm really looking forward to this even in poster Styles like the one shown below there are some magical text effects with the words stable diffusions 3 the letters have a captivating style and the spelling is correct it's totally following the text prompt and I think yeah this is like the mature stage from stable Cascade and also sdxl which is going to be sd3 is going to be really good at doing that and also some realism styles are very nice I saw some of the generated images from X that other people posted so that is going to be really awesome looking forward to testing this soon so we need to scroll up a little bit on the X feed here I saw this 1 hour ago from stability AI CEO and Founders and he said we want to make a large open stable video too similar to Sora but we need more data and compute power for that it will be something that basic basically requires more input data from people in open-source communities and also more computing power to calculate and perform all the AI algorithms in order to achieve more realism and life like generated videos like Sora so I think that is pretty true for AI stable diffusions I believe they have the technology to do that but the only way to generate good quality images is by having good hardware and more data to feed into the AI models this allows the AI models to understand the objects and regenerate them from text prompts or images into videos and so on that will be really cool so I'm looking forward to that as well and right now we have seen stable videos.com which is officially opened and you guys can try it out this is one way to use Stable videos and currently Stable videos 1.1 is deployed on this website stap videos.com and as you can see their model card is already available in the hugging phase so you guys can download it offline and use it on your local machine or use staev videos.com to generate video animations using the 1.1 models right now so I'm going to test both methods first I will try sta videos.com so the first thing you need to do is register and sign up or if you have a Google account you can log in with your Google account as well the first time you log in you will see the terms of service and you have to agree to them before you can use the platform take a look at the terms of service and accept them if you agree when you go to the main page you will see options to start with an image or start with text similar to stable video diffusions 1.1 in the hugging phase there are two methods starting with an image to generate videos or starting with text to generate videos it's clear that it's labeled powered by Stable videos diffusions on 1.1 here currently it's not stable videos 2 which will be released later on right now it's stable videos 1.1 with this UI workflow scroll down a little bit and you'll see a Community Showcase that displays high quality video results Cherry Picked from the community for example there's an elephant with camera panning Styles and then there's an object floating in a cave which is actually generated from text to videos and is a very high quality but if you take a closer look at the pattern inside the cave you'll notice that there is still room for improvement maybe that will be addressed in Stable videos too but we don't know yet so far I have observed that most of the image to video clips mostly involve camera panning or motion effects there are very few instances where the objects or characters in the videos are actually moving so if you use image to videos you can can expect mostly camera panning or angle panning motions in the generated results we will try both image to videos and text to videos and here as you can see there are camera panning settings tilt up and down and orbit and panning left and right you can set them up here below that we have the advanced settings here you can use a repeated seed number for your generated results and you have the option to set the sampling step the maximum sampling step is 40 inst stable videos.com that's how you can use it and then there's the motion strength setting that is the motion IDs similar to what we did in the convi workflow so here we have basically all the settings from Stable videos diffusions and as you can see below the camera panning setting we have the balance that will be 150 credits per day which will refresh every day so you can use it with that amount if you want to keep it free but if you want to generate more you'll need to top up your credits now let's try out some examples here when I logged into the community dashboard this one caught my eye because I like this kind of Medieval Age style reminiscent of Assassin's Creed here's a text prompt from the examples and I'm going to use it as a simple showcase for text to videos let's give it a try you can also click the try sample prompts button to randomly generate other text prompts so let's click this one and then click generate to wait for the result of the first step which is generating the image this follows the same concept as we did in the comfy UI workflow for stable video diffusions first if you use text to image then generate the first step is text to image and then the second step is image to videos so the same concept applies here in this interface we see the text prompt area and after generating four images you can choose one of them to proceed with the Second Step which is image to videos in this case I will choose the one where the character is walking and the camera is facing the front as there is already another one in the Community Showcase where the camera is facing from the back so let's try a different angle starting from here the Second Step involves choosing the image and then selecting the camera settings you can lock the camera zoom in and out and tilt it up and down for different camera movements here I'll apply a zoom in effect after a few minutes of generation the result is a walking character in a smooth scene it looks pretty nice much smoother compared to the previous versions of stable video diffusions 1.0 however when you look closely and in detail you'll notice that the bricks on the wall are not consistent and stay still even on the ground the bricks are not in a fixed position with each frame you can see them moving as the character walks and the camera pans out from the wall this is an area that can be improved in the future stable video diffusions models I hope they will work on achieving more consistency as they mentioned the newer models are using a Transformer architecture which is also being utilized by the Google AI videos model and open AI Sora this architecture can bring more consistency to characters and objects in videos allowing them to move smoothly from point A to point B without any de formation or flickering issues that's something we can expect from future models of stable video diffusions as well hopefully we'll see significant improvements in that area sometime this year next let's move on to testing image to videos you can click on the lower panel select an image and upload it here simply click the generate button and it will easily generate video scenes from the image however one thing to consider when using this feature is that image to videos will not have a lot of motion from characters or objects most of the time it will involve camera panning from left to right or Motion in the background scenery the characters or objects will usually remain still in one position this is a common occurrence when using image to videos not only in Stable videos but also in other AI models like Runway ML pabs and other AI video generators camera pending motion is is often the focus in these cases so let's try this out let's click camera lock in this one and let's click generate and see what happened so here's the output result as you can see is what I expected from image to videos the objects are staying still without moving and then we have the backgrounds moving from left to right and that feels like the camera is clockwise round going around and shooting the objects in here feris stays still and the female engineer is fixing the robot so another way we can use Stable videos diffusions is to download from hugging phase in here they have the model cart and labeled it as version 1.1 you can click on the file and version tabs and you can easily see this one SVD XT 1.1 save tensor files basically just download this one 4.78 GB so how do you save this go to confy UI and go to the model folder then go to the checkpoint folder once you're in the checkpoint folder you can save it in here but for me I like to organize different files so I create one subfolder for SVD and save it in here so I'm going to test version 1.0 and also 1.1 of SVD basically in here you can use a very simple workflow and then I have created an SVD workflow previously in the previous tutorial my workflow here is able to generate from text to videos in this whole workflow diagram and in it I have divided it into two steps of creating these video animations one of the groups here is image to videos which we get the image to generate as the videos as a result and then as you can see I am using these videos frames as recommended videos 25 so 25 videos frames is the recommended one from stable videos diffusions and I'm going to use 30 steps of sampling in here let's click and run it one time and see what happens the first time running this is going to take longer to load up the models file and the one that is showing currently is the previous example that I did before recording this video to make sure that it's working and looks pretty good okay so that took about four minutes to create these scenes and this is another generation from the image you can take a close look at this this is using Stable videos diffusions 1.1 to create and it looks very good the trees have not deformed and it can detect the water the clouds are the only objects moving in the image so let's duplicate these groups and I will be testing with Stable videos diffusions 1.0 side by side in here the first group on the top left is text to image obviously it's a very easy simple thing to do just text to image and then we connect the output image to this group for generating image to videos so right in the checkpoint loader I'm going to select the SV dxt and this is the stable videos diffusions 1.0 the first version of this model and then we can see if there's anything we need to set and run this beforehand I'm not going to save the output examples here let's try with other text prompts here to generate other images for them to run in comparison I'm going to try another fantasy style artwork image that is going to be very futuristic or very fantasy style of image so here we have the girls in blue dresses the moon and some stars in the sky like that let's see what they can generate in image to videos so there you go we have both the green color labeled as SVD 1.1 and the blue color as SVD 1.0 both of them can be generated they can detect the camera panning from left to right and the woman's hair is moving a little bit you can see there's more detail on the dress the Stars actually stay in the same form in SVD 1.1 except in 1.0 they start to fade out the shiny dot on the dress and also the stars are starting to fade out in the last one second there but in the one .1 model the output remains steady so the Stars keep appearing even in the last second of it here's another example of generating a landscape view it's a Hong Kong Harbor nighttime View and we have some fireworks on the top simulating the New Year's night with fireworks and then all the cities light up very similar things that hopefully we can see some different results in both 1.1 and 1.0 so here's the result of both models the green one is 1.1 and the blue one is 1.0 as you can see both results are very similar the image of the nighttime Hong Kong Harbor and you can see the fireworks glowing in the sky but the 1.1 version is performing better obviously the fireworks stay still but in 1.0 they start to fade out after 1 second of the animation so it does understand in the 1.0 as I can see it does understand what it's trying to do but there's less water reflection on the harbor side and in the 1.1 version the water on the Harbor is very clear you can see there's a water wave going on and the Reflection from the fireworks light is more detailed I will keep playing around with Stable videos diffusions and this is the workflow that we did comparing the same image using stable 1.1 and 1.0 I hope you guys get some inspiration from what happened in Stable videos diffusions and it will be improved a lot in the future I believe so hope you guys like it and have a nice day see you bye

Info

Channel: Future Thinker @Benji

Views: 3,226

Rating: undefined out of 5

Keywords: stable diffusion, stable diffusion tutorial, stable video diffusion, Stable Diffusions 3, Stable Video Diffusions 1.1, Stability AI, new models, object transformations, background changes, form changes, text recognition, Stable Cascade, early beta testing, Automatic 1111, Comfy UI, StableVideos.com, e-commerce, print-on-demand industry, text effects, image-to-videos, text-to-videos, camera panning, motion effects, image generation, video generation, transformer architecture

Id: Ye2-GwzjMN8

Channel Id: undefined

Length: 17min 9sec (1029 seconds)

Published: Sun Feb 25 2024