FREE and Unlimited Text-To-Video AI is Here! 🙏 Full Tutorials (Easy/Med/Hard)

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

text the video is finally becoming a reality some of the things that I've been seeing get created by people who are using text to video are absolutely incredible so I'm going to show you two different products one is closed source and it's really impressive the other is a brand new open source project that you can run on your local computer or Google call lab I'm going to show you all of these let's go so first is runwayml's Gen 2 product Gen 2 has been in the works for a while it's had a private beta for a while but now anybody can use it it's free but you are limited in the number of seconds of video that you can generate so let's try it out I'm going to say ducks on a lake generate now you can see up in this corner I now have 82 seconds of video left and it says each second of video generation uses five credits and you have 410 credits left Gen 2 is definitely on The Cutting Edge of text to video and does outperform everything else and here we go it's done so each video is about four to five seconds let's play it I mean that looks pretty good there's not a lot of movement but it certainly looks very accurate this duck looks like it has two heads but overall for a text to video which is in its earliest stages this is impressive so play around with this you can get this at runwayml.com it's free I think you get new credits every month but after that you do have to pay for it and for the pricing it's twelve dollars per editor per month you get upscale resolution you get to remove their watermarks you get shorter wait times and 125 seconds of generated video every month it may not sound like a lot but the amount of processing power it takes to make these videos is substantial and you'll see that shortly when I run it on my local machine next is an open source text a video project by potat1 and I'll drop all the links to these things in the description below so this is the hugging face page and if we scroll to the bottom we could go to their GitHub page and on their GitHub page they give us a bunch of different Google colab versions that use different text-to-video libraries I'm going to use the xeroscope V 1.1 text to video collab so here it is I already started running it the first thing you need to do is just click this play button and that's going to install all the libraries that you need and also clone the two repos it really could not be easier then down here this is where we're going to start entering our prompt so you can have prompt I'm going to say ducks on a lake similar to Gen 2 no negative prompt number of steps 33 I'm going to leave that guidance scale 23 frames per second I'm going to leave that and number of frames 24. now here's a big limitation at 24 total frames and 30 frames per second this is coming in at less than one second of video you can certainly increase it but what I've found is that if you increase it too much first of all you run run out of memory on Google collab and second of all the quality degrades really quickly I'm still trying to figure out how to maintain the quality of videos that are longer because on my local machine I can actually create longer videos because I have a pretty beefy GPU so once I figure that out I'll create an update video and I'll show you but for now let's run it push play and here we go now it's going to give us a warning that's okay we can ignore that and this does take a little while and here you can actually see it running and processing each frame and it says we're at about two seconds per it I think that means iteration but I'm not sure if you know leave a comment in the description below and let me know okay it's finished you're going to see this little check mark now to find the video that you just created you want to click this little folder icon on the left side and then you're going to go to outputs and then here it is and I'm going to right click and click download I'm going to save it to my desktop and let's open it up and see how it looks and there it is so again it's only one second of video Let's have it on repeat now it's pretty comparable to Gen 2 but you can't have very long videos and I'm going to show you that okay next I'm going to show you how to get this running locally I'm on a Windows machine and I have an Nvidia GPU so that's what I'll be using the first thing you're going to need is Anaconda and then that is python version management and it'll alleviate us of all those python version and module version mismatch issues and again I know a lot of you struggle with that I do too so please use Anaconda it makes things so much easier so the first thing I'm going to do is create a folder called content I'm going to name it content too because I already have a Content but you can go ahead and name it whatever you want from there now we're going to create our conda environment and we're going to use Python version 3.10.11 which is what I have found works with all of these tensorflow libraries and and all the other machine learning and AI libraries that we need and also it works with Cuda hit enter so it's giving me a warning do I want to remove the existing environment yes I do you probably won't see that then it asked me to proceed if I want to install all of these new packages yes I do and there we go so then I'm going to highlight this line and we're going to activate our conda environment with conda activate my end hit enter and there we could see my end next I'm going to make sure we have all of the torch libraries necessary to run this so I'm going to say conda install pytorch torch Vision torch audio we may not need torch audio but I included it because I had that text to audio library that I was working with as well so I'm going to go ahead and install it all of these scripts all of these commands will be in a link in the description below and I'm going to confirm yes I want all of these installed all right there it's finished the next thing we're going to do is clone the two repos that we need to get this running first we're going to clone the text to video fine-tuning library hit enter and it's done next we're going to actually clone the model and this is git clone and we're going to grab it from hugging face okay that's done that took a little while next we're going to change directory into the text to video fine tuning folder and then from there we're going to run pip install Dash R requirements.txt and that's going to install all the modules that we need for these scripts okay that finished so one thing I want to do before I run the inference script is make sure that I have Cuda installed and it's working and I want to run this little Checker script that makes sure that we have the right version of torch and Cuda and that Cuda is available so I'm going to write python Checker dot pi and there we go it gives me the version and that it is true and available all right and the last thing we have to do is run the inference file so it's Python inference.pi and then we pass it in a bunch of different variables and we want to make sure that we enter all the correct paths to the model and the repo so to do that we're going to come in here we're going to right click on it and we're going to say copy as path and that'll go in this First Command where it says Dash M so I'm just going to paste it in there and next we need the output folder and that's going to be right here already and I'm going to just make sure that this outputs folder is created so I go into here and there's no outputs folder so I'm just going to create new and then call it outputs and now it should work enter and there we go it's working and if we look at our monitor we can see that the GPU is running it and that's it so let's take a look at what it looks like so we go to the outputs folder and and there it is ducks on a lake and I think this looks really good the only problem is it's only one second now we can start to increase it but what I've found is that if we increase it past two seconds of video we really start to see a severe degradation in the quality I jumped into the Discord of this project and that's because they said the models are trained on one to two second videos and that makes a lot of sense they're working on this problem right now and in fact they gave me a suggestion of a new model I should try that model can be found right here and so I haven't tried it yet I'm going to try it out if I get it working I'll create another video on how to do that but now let me show you one more about what it looks like at 48 frames so we changed that last parameter to 48 we hit enter and there it goes it's running all right it's finished let's take a look at what that one looks like now so here's the second one and this is two seconds now so it still looks pretty good now let me show you what happens when we move it up to three seconds okay it's done let's take a look so here it is it actually still looks pretty decent but you can tell the Ducks are starting to pop in and out of nowhere and then once we increase it from here we're going to see a complete degradation of the video quality but they're working on it and the progress is so exciting so hopefully you get this working if you need any help jump in my Discord I'm happy to help out also jump in cam enduru's Discord they'll help you out as well there's a bunch of different models that you can try for text to video and some of them are going to do better than others but this is great progress and completely local and open source if you like this video please consider giving me a like And subscribe and I'll see you in the next one

Info

Channel: Matthew Berman

Views: 37,604

Rating: undefined out of 5

Keywords: text2video, text to video, text-to-video, text to video ai, artificial intelligence, ai, deep learning, stable diffusion text2video, modelscope text2video, stable diffusion, stable diffusion tutorial

Id: JuSU7VTlmII

Channel Id: undefined

Length: 8min 9sec (489 seconds)

Published: Mon Jun 12 2023